From dlong at openjdk.org Sat Mar 1 02:22:32 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 1 Mar 2025 02:22:32 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v4] In-Reply-To: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: > When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. > > In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. > > Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. Dean Long has updated the pull request incrementally with one additional commit since the last revision: use new Bytecode_invoke::has_memeber_arg ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23557/files - new: https://git.openjdk.org/jdk/pull/23557/files/ebf10dae..375f6cfe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23557&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23557&range=02-03 Stats: 13 lines in 4 files changed: 9 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23557.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23557/head:pull/23557 PR: https://git.openjdk.org/jdk/pull/23557 From dlong at openjdk.org Sat Mar 1 02:22:32 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 1 Mar 2025 02:22:32 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v3] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 19 Feb 2025 00:37:14 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > Stricter assertion on ppc64 Thanks Patricio and Richard for the reviews. New commit pushed that adds Bytecode_invoke::has_memeber_arg as suggested by Richard. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2691853937 From stuefe at openjdk.org Sat Mar 1 06:04:08 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 1 Mar 2025 06:04:08 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v4] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 04:17:54 GMT, Ashutosh Mehra wrote: >> We are only interested in a rise that rose significantly above **both** the start and end point of the measurements. >> >> E.g.: >> - if we have this: start = 0, end = 20MB, peak = 20MB, this is not a temporary peak and we already know that the end usage is 20MB. >> - if we have this: start = 20MB, end = 0, peak = 20MB, this is not a temporary peak either, because we already know the starting footprint was 20MB. >> - but if we have start = 0, end = 0, peak = 20MB, this is interesting since if we just print start and end we will miss the fact that in between those times we had temporarily allocated 20MB. > > Thanks for the explanation. It would be great if some comment can be added, possibly along with some example like the one in the previous comment, to explain the meaning of `temporary_peak_size` and the corresponding calculation. Will do. Thank you, @ashu-mehra ! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1976316773 From stuefe at openjdk.org Sat Mar 1 06:12:55 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 1 Mar 2025 06:12:55 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v6] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 12:33:14 GMT, Roberto Casta?eda Lozano wrote: >> Thomas Stuefe has updated the pull request incrementally with five additional commits since the last revision: >> >> - feedback ashu >> - feedback roberto >> - final-statistics-switch >> - performance fix >> - remove test code > > src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 306: > >> 304: if (_comp_type == compiler_c2) { >> 305: // Update C2 node count >> 306: // Careful, Compile::current() may be NULL in a short time window when Compile itself > > The recently added `sources/TestNoNULL.java` test fails due to this occurrence of `NULL`. > Suggestion: > > // Careful, Compile::current() may be null in a short time window when Compile itself Thanks. We should add those to GHAs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1976320472 From rrich at openjdk.org Sat Mar 1 22:23:56 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Sat, 1 Mar 2025 22:23:56 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v4] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Sat, 1 Mar 2025 02:22:32 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > use new Bytecode_invoke::has_memeber_arg Marked as reviewed by rrich (Reviewer). src/hotspot/share/runtime/vframeArray.cpp line 616: > 614: // invokedynamic instructions don't have a class but obviously don't have a MemberName appendix. > 615: // NOTE: Use machinery here that avoids resolving of any kind. > 616: const bool has_member_arg = inv.has_member_arg(); I reckon the comment about invokedynamic isn't needed anymore. It could be moved to has_member_arg if you want to keep it. ------------- PR Review: https://git.openjdk.org/jdk/pull/23557#pullrequestreview-2652589555 PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1976500470 From jsjolen at openjdk.org Sat Mar 1 22:27:10 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sat, 1 Mar 2025 22:27:10 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <_NgwaL7X0Wail8MgHyql0JSLLkPBbHgrnCuuhdDEpzo=.8a272cee-1077-4c68-a018-5df5c867cc68@github.com> Message-ID: <7GabYGWGOVewzZ1pVdsSHKBcZjTkMw9OIT3j-AAdEEY=.430b65f8-0f1d-4796-bee4-ecdc92f4a06b@github.com> On Fri, 28 Feb 2025 19:51:20 GMT, Gerard Ziemski wrote: >> The `HeapReserver` and `MemoryFileTracker` classes (in different parts of the code and different PRs) also use the same syntax for it. Here the same style is used to keep similarity in Hotspot code. > > Right, I didn't like it before, and spoke out against it, and now it is spreading :-) > > Why do we want to have more than one VMT? If we truly do, then I'm not sure there is anything that could be done here. We want to have many because it makes unit testing easier and more reliable, there is no longer any need to access the global object which the rest of the VM is working with. There are more advantages: Each member does not have to be static and a pointer which is initialized separately, everything is under one call to malloc. That reduces the number of pointers, keeping the lifetimes of the objects trivial. I'll pay some verbosity for that any day :-). If we find an alternative, can it wait with being implemented until this is integrated? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1976500876 From jsjolen at openjdk.org Sat Mar 1 22:30:05 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sat, 1 Mar 2025 22:30:05 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Fri, 28 Feb 2025 19:54:48 GMT, Gerard Ziemski wrote: >> There is coding style that the last enum member counts the number of them, like what we have `mt_number_of_tags` in `MemTag` enum. >> So, I renamed the last member to `st_number_of_states`. Would it be OK? Or preferred to use constant separately. > > Just because we do something already in Hotspot, doesn't necessarily mean that we should repeat the pattern going forward. > > I flagged it and I really don't like it, if you guys are OK with it, I will let it be. I'm fine with either ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1976501256 From stuefe at openjdk.org Sun Mar 2 06:47:04 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 2 Mar 2025 06:47:04 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v6] In-Reply-To: References: Message-ID: <9BZo8JdBff5sPxUOb-TTgKsfkFAetyNF4lOi-h7Xnus=.cc03855c-8ea6-4ad9-89e0-f82a7996471f@github.com> On Thu, 27 Feb 2025 10:04:04 GMT, Roberto Casta?eda Lozano wrote: >> Thomas Stuefe has updated the pull request incrementally with five additional commits since the last revision: >> >> - feedback ashu >> - feedback roberto >> - final-statistics-switch >> - performance fix >> - remove test code > > src/hotspot/share/compiler/compilationMemStatInternals.hpp line 243: > >> 241: int retrieve_live_node_count() const; >> 242: >> 243: DEBUG_ONLY(void verify() const;) > > Unused. I suggest to either call this function from some appropriate point (in debug mode only) or just remove it. I'll use it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1976553526 From stuefe at openjdk.org Sun Mar 2 06:56:05 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 2 Mar 2025 06:56:05 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v6] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 10:08:50 GMT, Roberto Casta?eda Lozano wrote: >> Thomas Stuefe has updated the pull request incrementally with five additional commits since the last revision: >> >> - feedback ashu >> - feedback roberto >> - final-statistics-switch >> - performance fix >> - remove test code > > src/hotspot/share/utilities/ostream.cpp line 225: > >> 223: while (count > 0) { >> 224: int nw = (count > 8) ? 8 : count; >> 225: this->write(tmp, nw); > > Are these changes essential for the rest of the changeset? If not, I would suggest to leave them to a separate RFE, for simplicity. Removed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1976554529 From stuefe at openjdk.org Sun Mar 2 07:04:55 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 2 Mar 2025 07:04:55 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v6] In-Reply-To: References: Message-ID: <3nlpgZQdqbh5M1_v_qFHpm_qzDm0anEGiGTJInDIEyw=.e13d4287-5822-4087-81d2-837a9ea8a5cc@github.com> On Thu, 27 Feb 2025 10:11:37 GMT, Roberto Casta?eda Lozano wrote: >> Thomas Stuefe has updated the pull request incrementally with five additional commits since the last revision: >> >> - feedback ashu >> - feedback roberto >> - final-statistics-switch >> - performance fix >> - remove test code > > src/hotspot/share/runtime/globals.hpp line 1402: > >> 1400: "Print metaspace statistics upon VM exit.") \ >> 1401: \ >> 1402: product(bool, PrintCompilerMemoryStatisticsAtExit, false, DIAGNOSTIC, \ > > Would it be possible to add a test for this new flag, perhaps by extending the existing test logic in `CompileCommandPrintMemStat`? The test is already there - we test the final print output in CompileCommandPrintMemStat.java. I will change the logic however and only print the final output if `PrintCompilerMemoryStatisticsAtExit` is given (before, it was also printed, and tested as part of, CompilerCommand memstat). Then I will explicitly pass this flag in the test. That should be good enough. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1976555741 From stuefe at openjdk.org Sun Mar 2 07:20:58 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 2 Mar 2025 07:20:58 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v7] In-Reply-To: References: Message-ID: <6SXwVSMnVb6f5MJ433OJnuircxxdfhyVOaYLtAblxNM=.6a629eb0-4949-499b-bb65-56b1ca514cba@github.com> > Greetings, > > This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. > > Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. > > I wanted to track that information correctly and display it clearly in a way that is easy to understand. > > The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). > > The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. > > The statistic gives us two new forms of output: > > 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: > > > Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: > Phase Total ra node comp type index reglive regsplit cienv other > none 1205512 155104 982984 33712 0 0 0 0 0 33712 > parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 > optimizer 916584 0 556416 0 0 0 0 0 0 360168 > escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 > connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 > macroEliminate 196448 0 196448 0 0 0 0 0 0 0 > iterGVN 327440 0 196368 131072 0 0 0 0 0 0 > incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824... Thomas Stuefe has updated the pull request incrementally with four additional commits since the last revision: - Feedback Roberto cont. - Roberto Arenas - NULL in comment - feedback ashu ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23530/files - new: https://git.openjdk.org/jdk/pull/23530/files/3052ddf8..d1dbba60 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=05-06 Stats: 56 lines in 16 files changed: 9 ins; 32 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/23530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23530/head:pull/23530 PR: https://git.openjdk.org/jdk/pull/23530 From stuefe at openjdk.org Sun Mar 2 07:20:58 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 2 Mar 2025 07:20:58 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: References: <0wHGNSlwe7cWb7Plad2n8Swy8rayYTAf5IETuw9zl4U=.a4d6a129-aebc-4639-aaef-92ee6c4552c7@github.com> Message-ID: On Wed, 26 Feb 2025 13:00:51 GMT, Roberto Casta?eda Lozano wrote: >>> > @robcasloz I identified and hopefully fixed a small issue that hit the "disabled" path. Turns out we allocate arena chunks a lot more frequently than I thought, and the new unconditional call to Thread::current() in there was hurting a bit. I now avoid this unless I know the statistic is enabled. >>> > With this patch, on my machine the difference between unpatched and patched JVM with stats disabled is below one standard deviation for the benchmark in question. >>> >>> Great, thanks! Will re-run benchmarking and report results early next week. >> >> Functional test results (Oracle tier1-5) still look good for the latest commit (dd7a06ad). I can confirm that the C2 speed regression on our linux-x64 machines is almost fully mitigated. The 2-3% regression on our macosx-aarch64 machines does not seem to be addressed by the latest changes though, but as I mentioned before I think it is in the acceptable range (and only affects one benchmark). > >> @robcasloz, @ashu-mehra thanks a lot for your reviews. I incorporated most of them into the PR. > > Thanks, Thomas! I see that the changes suggested in https://github.com/openjdk/jdk/commit/d501bd8a674229904358fb168a9c347004efeea3 are not incorporated, is it because you find them out of the scope of this PR? I would argue that at least tagging `Compile::_Compile_types` with `tag_type` is relevant and in line with the other changes included in this PR, e.g. [this one](https://github.com/openjdk/jdk/pull/23530/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R443). @robcasloz I think I addressed all of your concerns. Thanks for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2692596999 From sroy at openjdk.org Sun Mar 2 17:10:04 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Sun, 2 Mar 2025 17:10:04 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Fri, 28 Feb 2025 16:30:16 GMT, Martin Doerr wrote: >> Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: >> >> use vsplitsb > > src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 574: > >> 572: masm->vsldoi(vLowProduct, vLowProduct, vLowProduct, 8); // Swap >> 573: masm->vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant >> 574: masm->vsldoi(vCombinedResult, vLowProduct, vLowProduct, 8); // Swap > > The part between the vpsumd instructions looks too complicated. Isn't it equivalent to the following? > > masm->vsldoi(vTmp8, vLowProduct, vHighProduct, 8); > masm->vsldoi(vTmp9, vReducedLow, vReducedLow, 8); > masm->vxor(vTmp8, vTmp8, vMidProduct); > masm->vxor(vCombinedResult, vTmp8, vTmp9); @TheRealMDoerr can you explain how it can be equivalent to these 4 instructions ? we are extracting the different parts of midProduct here ,64 bits each, for the cross product. I,e Xl * Hh +Hl*Xh , so the below 2 are required masm->vsldoi(vTmp8, vMidProduct, vZero, 8); masm->vsldoi(vTmp9, vZero, vMidProduct, 8); ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1976673259 From amitkumar at openjdk.org Mon Mar 3 03:15:58 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 3 Mar 2025 03:15:58 GMT Subject: RFR: 8350716: [s390] intrinsify Thread.currentThread() [v2] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 10:45:11 GMT, Amit Kumar wrote: >> s390x port for [JDK-8278793](https://bugs.openjdk.org/browse/JDK-8278793) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > comment from Lutz Thanks for the approval Lutz, Martin. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23791#issuecomment-2693176844 From amitkumar at openjdk.org Mon Mar 3 03:15:59 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 3 Mar 2025 03:15:59 GMT Subject: Integrated: 8350716: [s390] intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 04:14:37 GMT, Amit Kumar wrote: > s390x port for [JDK-8278793](https://bugs.openjdk.org/browse/JDK-8278793) This pull request has now been integrated. Changeset: 93c87845 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/93c878455bfffc07f115f9e20ee11b20186eb2be Stats: 14 lines in 1 file changed: 13 ins; 1 del; 0 mod 8350716: [s390] intrinsify Thread.currentThread() Reviewed-by: lucy, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/23791 From tschatzl at openjdk.org Mon Mar 3 08:42:05 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 08:42:05 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * fix comment (trailing whitespace) * another assert when snapshotting at a safepoint. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/d87935a0..810bf2d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=03-04 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From rcastanedalo at openjdk.org Mon Mar 3 09:19:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 3 Mar 2025 09:19:55 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v7] In-Reply-To: <6SXwVSMnVb6f5MJ433OJnuircxxdfhyVOaYLtAblxNM=.6a629eb0-4949-499b-bb65-56b1ca514cba@github.com> References: <6SXwVSMnVb6f5MJ433OJnuircxxdfhyVOaYLtAblxNM=.6a629eb0-4949-499b-bb65-56b1ca514cba@github.com> Message-ID: On Sun, 2 Mar 2025 07:20:58 GMT, Thomas Stuefe wrote: >> Greetings, >> >> This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. >> >> Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. >> >> I wanted to track that information correctly and display it clearly in a way that is easy to understand. >> >> The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). >> >> The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. >> >> The statistic gives us two new forms of output: >> >> 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: >> >> >> Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: >> Phase Total ra node comp type index reglive regsplit cienv other >> none 1205512 155104 982984 33712 0 0 0 0 0 33712 >> parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 >> optimizer 916584 0 556416 0 0 0 0 0 0 360168 >> escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 >> connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 >> macroEliminate 196448 0 196448 0 0 0 0 0 0 0 >> iterGVN 327440 0 196368 131072 0 0 0 0 0 0 >> incrementalInline 3992816 0 3043704 62... > > Thomas Stuefe has updated the pull request incrementally with four additional commits since the last revision: > > - Feedback Roberto cont. > - Roberto Arenas > - NULL in comment > - feedback ashu Thanks for addressing my comments! The latest changeset looks good (modulo unnecessary changes in `ostream.hpp`) and passes all tier1-5 tests in Oracle's test pipeline. src/hotspot/share/utilities/ostream.hpp line 1: > 1: /* Please revert the unnecessary changes in this file as well. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23530#pullrequestreview-2653497625 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1977135658 From cnorrbin at openjdk.org Mon Mar 3 09:52:27 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 3 Mar 2025 09:52:27 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v4] In-Reply-To: References: Message-ID: > Hi everyone, > > The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. > > The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Additionally, I've created `align_up_or_min`, which behaves similarly to the original align_up but handles overflows predictably across all integer types. This new function is used in the locations where overflow checks already exist, providing a safer alternative. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: changed alignment arg in psoldgen ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23711/files - new: https://git.openjdk.org/jdk/pull/23711/files/86d91252..3068917b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23711.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23711/head:pull/23711 PR: https://git.openjdk.org/jdk/pull/23711 From cnorrbin at openjdk.org Mon Mar 3 09:52:28 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 3 Mar 2025 09:52:28 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v3] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 19:29:00 GMT, Albert Mingkun Yang wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> changed max size of MinHeapDeltaBytes > > src/hotspot/share/gc/parallel/psOldGen.cpp line 193: > >> 191: #endif >> 192: const size_t alignment = virtual_space()->alignment(); >> 193: size_t aligned_bytes = align_up_or_min(bytes, alignment); > > How about using `bytes = MIN2(bytes, virtual_space()->uncommitted_size())` to dodge the potential overflow? I find it more intuitive to provide a proper arg to `align_up` and expect the result to be >= arg. Changed it to this now ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1977194293 From jsjolen at openjdk.org Mon Mar 3 10:10:00 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 3 Mar 2025 10:10:00 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 09:49:41 GMT, Afshin Zafari wrote: > With the `size` parameter there will be no need to traverse/go through the nodes between the base and end of the region. > Tests: > linux-x64-debug, gtest:NMT* and runtime/NMT* LGTM ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23770#pullrequestreview-2653648133 From stuefe at openjdk.org Mon Mar 3 10:41:16 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 3 Mar 2025 10:41:16 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v8] In-Reply-To: References: Message-ID: > Greetings, > > This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. > > Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. > > I wanted to track that information correctly and display it clearly in a way that is easy to understand. > > The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). > > The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. > > The statistic gives us two new forms of output: > > 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: > > > Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: > Phase Total ra node comp type index reglive regsplit cienv other > none 1205512 155104 982984 33712 0 0 0 0 0 33712 > parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 > optimizer 916584 0 556416 0 0 0 0 0 0 360168 > escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 > connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 > macroEliminate 196448 0 196448 0 0 0 0 0 0 0 > iterGVN 327440 0 196368 131072 0 0 0 0 0 0 > incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: - Merge branch 'master' into JDK-8344009-Improve-Compiler-memstat - remove unnecessary changes in osream.hpp - Feedback Roberto cont. - Roberto Arenas - NULL in comment - feedback ashu - feedback ashu - feedback roberto - final-statistics-switch - performance fix - ... and 10 more: https://git.openjdk.org/jdk/compare/7d4fd3ef...f82d37cd ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23530/files - new: https://git.openjdk.org/jdk/pull/23530/files/d1dbba60..f82d37cd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=06-07 Stats: 20868 lines in 719 files changed: 10434 ins; 7437 del; 2997 mod Patch: https://git.openjdk.org/jdk/pull/23530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23530/head:pull/23530 PR: https://git.openjdk.org/jdk/pull/23530 From stuefe at openjdk.org Mon Mar 3 10:41:16 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 3 Mar 2025 10:41:16 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v7] In-Reply-To: References: <6SXwVSMnVb6f5MJ433OJnuircxxdfhyVOaYLtAblxNM=.6a629eb0-4949-499b-bb65-56b1ca514cba@github.com> Message-ID: On Mon, 3 Mar 2025 09:17:03 GMT, Roberto Casta?eda Lozano wrote: > Thanks for addressing my comments! The latest changeset looks good (modulo unnecessary changes in `ostream.hpp`) and passes all tier1-5 tests in Oracle's test pipeline. Many thanks, @robcasloz ! I also merged master and will wait for the last GHAs to finish. I may need a review update for the latest version. > src/hotspot/share/utilities/ostream.hpp line 1: > >> 1: /* > > Please revert the unnecessary changes in this file as well. Arrgh forgot those ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2693941082 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1977277368 From mdoerr at openjdk.org Mon Mar 3 10:50:56 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 3 Mar 2025 10:50:56 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Sun, 2 Mar 2025 17:07:11 GMT, Suchismith Roy wrote: >> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 574: >> >>> 572: masm->vsldoi(vLowProduct, vLowProduct, vLowProduct, 8); // Swap >>> 573: masm->vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant >>> 574: masm->vsldoi(vCombinedResult, vLowProduct, vLowProduct, 8); // Swap >> >> The part between the vpsumd instructions looks too complicated. Isn't it equivalent to the following? >> >> masm->vsldoi(vTmp8, vLowProduct, vHighProduct, 8); >> masm->vsldoi(vTmp9, vReducedLow, vReducedLow, 8); >> masm->vxor(vTmp8, vTmp8, vMidProduct); >> masm->vxor(vCombinedResult, vTmp8, vTmp9); > > @TheRealMDoerr can you explain how it can be equivalent to these 4 instructions ? > we are extracting the different parts of midProduct here ,64 bits each, for the cross product. > I,e Xl * Hh +Hl*Xh , so the below 2 are required > masm->vsldoi(vTmp8, vMidProduct, vZero, 8); > masm->vsldoi(vTmp9, vZero, vMidProduct, 8); > > > > > ? Your version extracts 2 8 Byte parts and feeds them into separate xor instructions. My proposal performs both 8 Byte xor operations with one vxor instruction by selecting the input bits accordingly. It furthermore avoids swapping halves forth and back (I swap the halves of vReducedLow instead). Have you tried? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1977294138 From duke at openjdk.org Mon Mar 3 11:18:32 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 3 Mar 2025 11:18:32 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA Message-ID: By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. ------------- Commit messages: - JDK-8351034 Add AVX-512 intrinsics for ML-DSA Changes: https://git.openjdk.org/jdk/pull/23860/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351034 Stats: 2530 lines in 18 files changed: 2445 ins; 9 del; 76 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From tschatzl at openjdk.org Mon Mar 3 12:11:02 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 12:11:02 GMT Subject: RFR: 8350956: Fix repetitions of the word "the" in compiler component comments Message-ID: Hi all, please review this trivial change that fixes "the the" repetitions in the compiler related sources. If you think it's not worth fixing, I am okay with that and just retract the change. Testing: gha Thanks, Thomas ------------- Commit messages: - 8350956 Changes: https://git.openjdk.org/jdk/pull/23858/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23858&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350956 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23858.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23858/head:pull/23858 PR: https://git.openjdk.org/jdk/pull/23858 From azafari at openjdk.org Mon Mar 3 12:11:19 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 3 Mar 2025 12:11:19 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v33] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 13:29:01 GMT, Johan Sj?len wrote: >> 2 questions: >> >> 1st, I must be misunderstanding something here. Johan asked to change the API from: >> >> `visit_committed_regions(ReservedMemoryRegion& committed_rgn)` >> >> to >> >> `visit_committed_regions(position start, size size)` >> >> but I still see the old way. >> >> 2nd, why are we asking for this change? > > We want to remove `ReservedMemoryRegion` in a follow up PR to this one. Another step is to remove the `CommittedMemoryRegion` class as well. To be more specific, `ReservedMemoryRegion` structure/class will still remain in the code, but the `find_reserved_region(address, size_t)` will be removed. At least, in neither of my PRs it would be removed. These structs are useful in encapsulating the info about the regions and have been used many times in the code. Do we have any reason to remove them? Anyway, it is better to remove them in the related PRs, if there will be any. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1977368095 From rcastanedalo at openjdk.org Mon Mar 3 12:14:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 3 Mar 2025 12:14:52 GMT Subject: RFR: 8350956: Fix repetitions of the word "the" in compiler component comments In-Reply-To: References: Message-ID: <_qeU9w886YHSBLxN-IunDO6ted4cBnT54IIEtvxqXi8=.38195df3-0e5e-43fb-8bb4-bf3435cba607@github.com> On Mon, 3 Mar 2025 11:07:41 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial change that fixes "the the" repetitions in the > compiler related sources. > > If you think it's not worth fixing, I am okay with that and just retract the change. > > Testing: gha > > Thanks, > Thomas Looks good and trivial! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23858#pullrequestreview-2653954556 From tschatzl at openjdk.org Mon Mar 3 12:33:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 12:33:56 GMT Subject: RFR: 8350956: Fix repetitions of the word "the" in compiler component comments In-Reply-To: <_qeU9w886YHSBLxN-IunDO6ted4cBnT54IIEtvxqXi8=.38195df3-0e5e-43fb-8bb4-bf3435cba607@github.com> References: <_qeU9w886YHSBLxN-IunDO6ted4cBnT54IIEtvxqXi8=.38195df3-0e5e-43fb-8bb4-bf3435cba607@github.com> Message-ID: On Mon, 3 Mar 2025 12:12:30 GMT, Roberto Casta?eda Lozano wrote: >> Hi all, >> >> please review this trivial change that fixes "the the" repetitions in the >> compiler related sources. >> >> If you think it's not worth fixing, I am okay with that and just retract the change. >> >> Testing: gha >> >> Thanks, >> Thomas > > Looks good and trivial! Thanks @robcasloz for your review ------------- PR Comment: https://git.openjdk.org/jdk/pull/23858#issuecomment-2694215730 From tschatzl at openjdk.org Mon Mar 3 12:33:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 12:33:57 GMT Subject: Integrated: 8350956: Fix repetitions of the word "the" in compiler component comments In-Reply-To: References: Message-ID: <9Xs26rvFpf6FkbQkFXXpwMJZpT4SdCUJBWzXmF1lPyE=.12f7076d-af81-4fd8-96f0-491c62b0d11e@github.com> On Mon, 3 Mar 2025 11:07:41 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial change that fixes "the the" repetitions in the > compiler related sources. > > If you think it's not worth fixing, I am okay with that and just retract the change. > > Testing: gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: 30b0c609 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/30b0c6098028cce63e65bd9d563973f2774fa74d Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod 8350956: Fix repetitions of the word "the" in compiler component comments Reviewed-by: rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/23858 From tschatzl at openjdk.org Mon Mar 3 12:37:06 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 12:37:06 GMT Subject: RFR: 8346194: Improve G1 pre-barrier C2 cost estimate Message-ID: Hi all, please review this change that modifies pre-barrier node costs for loop-unrolling to only consider the fast path. The reasoning is similar to zgc (and the new costs as well): only the part of the barrier inlined into the main code stream, as the slow path is laid out separately and does/should not directly affect performance (particularly if there is no marking going on). There are no differences/impact in performance since the post barrier cost is still very large, which fill be fixed elsewhere. Testing: gha, perf testing standalone (neither micros nor actual benchmarks give any difference outside of variance), testing with JDK-8342382 Hth, Thomas ------------- Commit messages: - 8346194 Changes: https://git.openjdk.org/jdk/pull/23862/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23862&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346194 Stats: 11 lines in 1 file changed: 5 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23862/head:pull/23862 PR: https://git.openjdk.org/jdk/pull/23862 From dfenacci at openjdk.org Mon Mar 3 12:43:53 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 3 Mar 2025 12:43:53 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v4] In-Reply-To: <2jI87up85vKeQq7xy6WoI987MOuqTqA6I8G75VvC74g=.e8ef9f9c-b8b3-496d-9b48-28c83dc1fb64@github.com> References: <2jI87up85vKeQq7xy6WoI987MOuqTqA6I8G75VvC74g=.e8ef9f9c-b8b3-496d-9b48-28c83dc1fb64@github.com> Message-ID: On Fri, 28 Feb 2025 20:35:58 GMT, Dean Long wrote: > Refreshing my memory, isn't the real problem with trying to fix this with a minimum codecache size is that some of these stubs are not allocated during initial single-threaded JVM startup, but later when the first compiler threads start, and that allows other code blobs to fill up the codecache? Yes, exactly. This seems to be even more of an issue with 2 compiler threads (i.e. C1/C2) since the first can fill up the code cache first at the expense of the other. The result is that if one compiler thread tries to allocate more space in a full code cache during initialization with one of the 4 call paths above, the VM crashes (but could actually just turn off the compiler thread instead). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23630#issuecomment-2694260818 From dfenacci at openjdk.org Mon Mar 3 12:54:26 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 3 Mar 2025 12:54:26 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v5] In-Reply-To: References: Message-ID: > # Issue > The test `src/hotspot/share/opto/c2compiler.cpp` fails intermittently due to a crash that happens when trying to allocate code cache space for C1 and C2 in `RuntimeStub::new_runtime_stub` and `SingletonBlob::operator new`. > > # Causes > There are a few call paths during the initialization of C1 and C2 that can lead to the code cache allocations in `RuntimeStub::new_runtime_stub` (through `RuntimeStub::operator new`) and `SingletonBlob::operator new` triggering a fatal error if there is no more space. The paths in question are: > 1. `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_uncommon_trap_blob` -> `UncommonTrapBlob::create` -> `new UncommonTrapBlob` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_exception_blob` -> `ExceptionBlob::create` -> `new ExceptionBlob` > > # Solution > Instead of fatally crashing the we can use the `alloc_fail_is_fatal` flag of `RuntimeStub::new_runtime_stub` to avoid crashing in cases 1 and 2 and add a similar flag to `SingletonBlob::operator new` for cases 3 and 4. In the latter case we need to adjust all calls accordingly. > > Note: In [JDK-8326615](https://bugs.openjdk.org/browse/JDK-8326615) it was argued that increasing the minimum code cache size would solve the issue but that wasn't entirely accurate: doing so possibly decreases the chances of a failed allocation in these 4 places but doesn't totally avoid it. > > # Testing > The original failing regression test in `test/hotspot/jtreg/compiler/startup/StartupOutput.java` has been modified to run multiple times with randomized values (within the original failing range) to increase the chances of hitting the fatal assertion. > > Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8347406: move assert into else clause ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23630/files - new: https://git.openjdk.org/jdk/pull/23630/files/906cd756..722ca508 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23630&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23630&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23630/head:pull/23630 PR: https://git.openjdk.org/jdk/pull/23630 From dfenacci at openjdk.org Mon Mar 3 12:58:53 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 3 Mar 2025 12:58:53 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v5] In-Reply-To: References: Message-ID: <7p-BfhPDiY8ImbAwlaBaN1Mre-HA0zpEz42NTQWYMoE=.38ad35e1-0e5f-43b7-9f1d-4c0461881f76@github.com> On Fri, 28 Feb 2025 20:43:03 GMT, Dean Long wrote: >> A slightly modified one surely is. Inserted it again. > > I was thinking it could be moved into the `else` clause and simplified further. Oh I see ?. Moved. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23630#discussion_r1977476672 From ayang at openjdk.org Mon Mar 3 14:06:01 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 3 Mar 2025 14:06:01 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v3] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 09:49:25 GMT, Casper Norrbin wrote: >> src/hotspot/share/gc/parallel/psOldGen.cpp line 193: >> >>> 191: #endif >>> 192: const size_t alignment = virtual_space()->alignment(); >>> 193: size_t aligned_bytes = align_up_or_min(bytes, alignment); >> >> How about using `bytes = MIN2(bytes, virtual_space()->uncommitted_size())` to dodge the potential overflow? I find it more intuitive to provide a proper arg to `align_up` and expect the result to be >= arg. > > Changed it to this now Thank you; my suggestion was insufficient... Need to have an early-return when `uncommitted_size` is 0 at the beginning of `PSOldGen::expand`. After this, I wonder if `align_up_or_min` is truly warranted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1977551923 From rcastanedalo at openjdk.org Mon Mar 3 14:19:03 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 3 Mar 2025 14:19:03 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v8] In-Reply-To: References: Message-ID: <58iKQTQUKK7ChuZj6Pzq07VNuE1CFQKxHznKXJYFbM8=.3ce9fed9-62fd-48c4-bfac-75d41ffd06e8@github.com> On Mon, 3 Mar 2025 10:41:16 GMT, Thomas Stuefe wrote: >> Greetings, >> >> This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. >> >> Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. >> >> I wanted to track that information correctly and display it clearly in a way that is easy to understand. >> >> The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). >> >> The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. >> >> The statistic gives us two new forms of output: >> >> 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: >> >> >> Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: >> Phase Total ra node comp type index reglive regsplit cienv other >> none 1205512 155104 982984 33712 0 0 0 0 0 33712 >> parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 >> optimizer 916584 0 556416 0 0 0 0 0 0 360168 >> escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 >> connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 >> macroEliminate 196448 0 196448 0 0 0 0 0 0 0 >> iterGVN 327440 0 196368 131072 0 0 0 0 0 0 >> incrementalInline 3992816 0 3043704 62... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - Merge branch 'master' into JDK-8344009-Improve-Compiler-memstat > - remove unnecessary changes in osream.hpp > - Feedback Roberto cont. > - Roberto Arenas > - NULL in comment > - feedback ashu > - feedback ashu > - feedback roberto > - final-statistics-switch > - performance fix > - ... and 10 more: https://git.openjdk.org/jdk/compare/06fe586a...f82d37cd Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23530#pullrequestreview-2654253184 From amitkumar at openjdk.org Mon Mar 3 14:25:54 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 3 Mar 2025 14:25:54 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 08:42:05 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * fix comment (trailing whitespace) > * another assert when snapshotting at a safepoint. I don't see any failure on s390x. Tier1 test looks good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2694563382 From stuefe at openjdk.org Mon Mar 3 14:42:04 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 3 Mar 2025 14:42:04 GMT Subject: Integrated: 8344009: Improve compiler memory statistics In-Reply-To: References: Message-ID: On Sat, 8 Feb 2025 06:56:40 GMT, Thomas Stuefe wrote: > Greetings, > > This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. > > Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. > > I wanted to track that information correctly and display it clearly in a way that is easy to understand. > > The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). > > The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. > > The statistic gives us two new forms of output: > > 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: > > > Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: > Phase Total ra node comp type index reglive regsplit cienv other > none 1205512 155104 982984 33712 0 0 0 0 0 33712 > parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 > optimizer 916584 0 556416 0 0 0 0 0 0 360168 > escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 > connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 > macroEliminate 196448 0 196448 0 0 0 0 0 0 0 > iterGVN 327440 0 196368 131072 0 0 0 0 0 0 > incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824... This pull request has now been integrated. Changeset: db69ec9e Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/db69ec9e583791d359c5c0acb504c7f01e963e3b Stats: 1732 lines in 32 files changed: 1147 ins; 248 del; 337 mod 8344009: Improve compiler memory statistics Reviewed-by: rcastanedalo, asmehra ------------- PR: https://git.openjdk.org/jdk/pull/23530 From ayang at openjdk.org Mon Mar 3 15:22:10 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 3 Mar 2025 15:22:10 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 08:42:05 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * fix comment (trailing whitespace) > * another assert when snapshotting at a safepoint. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 106: > 104: > 105: __ testptr(count, count); > 106: __ jcc(Assembler::equal, done); I wonder if we can use "zero" instead of "equal" here; they have the same underlying value, but the semantic is to checking for "zero". src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 133: > 131: Label is_clean_card; > 132: __ cmpb(Address(addr, 0), G1CardTable::clean_card_val()); > 133: __ jcc(Assembler::equal, is_clean_card); Should this checking be guarded by `if (UseCondCardMark)`? I see that aarch64 does that. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 143: > 141: > 142: __ bind(is_clean_card); > 143: // Card was not clean. Dirty card and go to next.. Why "not clean"? I thought this path is for dirtying clean card? src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 323: > 321: assert(thread == r15_thread, "must be"); > 322: #endif // _LP64 > 323: assert_different_registers(store_addr, new_val, thread, tmp1 /*, tmp2 unused */, noreg); Seems that `tmp2` is unused in this method. It is used in aarch64, but it's not obvious to me whether that is indeed necessary. If so, can you add a comment saying sth like "this unused var is needed for other archs..."? src/hotspot/share/gc/g1/g1CardTable.inline.hpp line 54: > 52: // result = 0xBBAABBAA > 53: inline size_t blend(size_t a, size_t b, size_t mask) { > 54: return a ^ ((a ^ b) & mask); The example makes it much clearer; I wonder if `return (a & ~mask) | (b & mask);` is more readable. src/hotspot/share/gc/g1/g1CardTableClaimTable.cpp line 59: > 57: > 58: void G1CardTableClaimTable::reset_all_claims_to_claimed() { > 59: for (size_t i = 0; i < _max_reserved_regions; i++) { `uint` for `i`? src/hotspot/share/gc/g1/g1CardTableClaimTable.hpp line 64: > 62: void reset_all_claims_to_unclaimed(); > 63: void reset_all_claims_to_claimed(); > 64: I wonder if these two APIs can be renamed to "reset_all_to_x", which is more aligned with its single-region counterpart, `reset_to_unclaimed`, IMO. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 348: > 346: void G1ConcurrentRefineWorkState::snapshot_heap_into(G1CardTableClaimTable* sweep_table) { > 347: // G1CollectedHeap::heap_region_iterate() below will only visit committed regions. Initialize > 348: // all entries in the state table here to not require special handling when iterating over it. Can you elaborate on what the "special handling" would be, if we don's set "claimed" for non-committed regions? src/hotspot/share/gc/g1/g1RemSet.cpp line 837: > 835: for (; refinement_cur_card < refinement_end_card; ++refinement_cur_card, ++card_cur_word) { > 836: size_t value = *refinement_cur_card; > 837: *refinement_cur_card = G1CardTable::WordAllClean; Similarly, this is a "word", not "card", also. src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 857: > 855: // We do not expect too many non-Java threads compared to Java threads, so just > 856: // let one worker claim that work. > 857: if (!_non_java_threads_claim && !Atomic::cmpxchg(&_non_java_threads_claim, false, true, memory_order_relaxed)) { Do non-java threads have card-table-base? src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 862: > 860: > 861: class ResizeAndSwapCardTableClosure : public ThreadClosure { > 862: SwapCardTableClosure _cl; Field indentation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977586579 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977594184 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977583002 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977601907 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977645576 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977571306 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977573354 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977704351 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977575441 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977701293 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977679688 From dnsimon at openjdk.org Mon Mar 3 15:32:18 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 3 Mar 2025 15:32:18 GMT Subject: RFR: 8350892: [JVMCI] Align ResolvedJavaType.getInstanceFields with Class.getDeclaredFields Message-ID: The current order of fields returned by `ResolvedJavaType.getInstanceFields` is a) not well specified and b) different than the order of fields used almost everywhere else in HotSpot. This PR aligns the order of `getInstanceFields` with `Class.getDeclaredFields()`. It also makes `ciInstanceKlass::_nonstatic_fields` use the same order which unifies how escape analysis and deoptimization treats fields across C2 and JVMCI. ------------- Commit messages: - made order of ciInstanceKlass::_nonstatic_fields same as JavaFieldStream (and Class.getDeclaredFields) - made order of ResolvedJavaType.getInstanceFields match Class.getDeclaredFields Changes: https://git.openjdk.org/jdk/pull/23849/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23849&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350892 Stats: 89 lines in 6 files changed: 18 ins; 32 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/23849.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23849/head:pull/23849 PR: https://git.openjdk.org/jdk/pull/23849 From tschatzl at openjdk.org Mon Mar 3 15:40:04 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 15:40:04 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 14:11:09 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * fix comment (trailing whitespace) >> * another assert when snapshotting at a safepoint. > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 143: > >> 141: >> 142: __ bind(is_clean_card); >> 143: // Card was not clean. Dirty card and go to next.. > > Why "not clean"? I thought this path is for dirtying clean card? My interpretation is: in this path the card has been found clean ("is clean") earlier. So dirty it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977733993 From tschatzl at openjdk.org Mon Mar 3 15:42:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 15:42:57 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 14:47:00 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * fix comment (trailing whitespace) >> * another assert when snapshotting at a safepoint. > > src/hotspot/share/gc/g1/g1CardTable.inline.hpp line 54: > >> 52: // result = 0xBBAABBAA >> 53: inline size_t blend(size_t a, size_t b, size_t mask) { >> 54: return a ^ ((a ^ b) & mask); > > The example makes it much clearer; I wonder if `return (a & ~mask) | (b & mask);` is more readable. ... and hope that the optimizer knows this pattern? If you insist I can do that, brief examination of that code snippet by itself (not within this code) showed that it does. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977739888 From mdoerr at openjdk.org Mon Mar 3 16:31:58 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 3 Mar 2025 16:31:58 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: <_XnhdwtuB6AhiTL4TYmV4yqIy_WwQEeASn2b2zL9-V0=.05ec2994-8599-4f76-871d-a9e2bbe8afa2@github.com> Message-ID: <413JPgs-IIREKFfH05GHeskZzg5lpyBuNbW6jeGyQVk=.35277f99-0552-4e06-92a0-17d051979e1a@github.com> On Fri, 28 Feb 2025 10:47:39 GMT, Martin Doerr wrote: > > I've used QEMU to smoke test this PR on ppc64le, riscv64 and s390x, But it would be nice if @TheRealMDoerr, @RealFYang and @offamitkumar could check if it runs okay on real hardware as well. > > The PPC64 code looks correct and some quick tests have passed. I'll run larger test suites over the weekend. Test results look good (including tier 1-4 on many platforms). I didn't see any new issue related to this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23421#issuecomment-2694935731 From duke at openjdk.org Mon Mar 3 16:47:57 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Mon, 3 Mar 2025 16:47:57 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v2] In-Reply-To: References: Message-ID: > This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. > > I tested it with: > > > java -Xlog:os+container=trace -version > > on: > > `Red Hat Enterprise Linux 8 (cgroups v1 only)`: > _No change in behaviour_ > > `Fedora 41 (cgroups v2)`: > _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ > > --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 > +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 > @@ -1,7 +1,12 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 controller cpu is enabled and required > +[debug][os,container] v2 controller io is enabled but not relevant > +[debug][os,container] v2 controller memory is enabled and required > +[debug][os,container] v2 controller hugetlb is enabled but not relevant > +[debug][os,container] v2 controller pids is enabled and relevant > +[debug][os,container] v2 controller rdma is enabled but not relevant > +[debug][os,container] v2 controller misc is enabled but not relevant > [debug][os,container] Detected cgroups v2 unified hierarchy > [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope > [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max > > > `Fedora 41 (custom kernel with cgroups v1 disabled)`: > _Fixes `cgroups v2` detection:_ > > --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 > +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 > @@ -1,7 +1,63 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > -[debug][os,container] controller memory is not enabled > - ] > -[debug][os,container] One or more required controllers disabled at kernel level. > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 contro... Thomas Fitzsimmons has updated the pull request incrementally with 10 additional commits since the last revision: - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing Remove from cgroups v1 branch incorrect log messages about cpuset controller being optional. Add test case for cgroups v1, cpuset disabled. - Improve !cgroups_v2_enabled branch comment - Debug-log optional and disabled cgroups v2 controllers Do not log enabled controllers that are not relevant to the JDK. - Move index declaration to scope in which it is used - Remove empty string check during cgroup.controllers parsing - Define ISSPACE_CHARS macro, use it in strsep call - Pass fgets result to strsep - Replace is_cgroupsV2 with cgroups_v2_enabled Also fix the testCgroupv1SystemdOnly and testCgroupv1NoMounts test cases such that their /proc/cgroups and /proc/self/cgroup contents correspond. This prevents assertion failures these tests were producing when is_cgroupsV2 was replaced with cgroups_v2_enabled. - Comment statfs check - Add redefinition guard for CGROUP2_SUPER_MAGIC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23811/files - new: https://git.openjdk.org/jdk/pull/23811/files/39a6463c..67107287 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23811&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23811&range=00-01 Stats: 127 lines in 2 files changed: 92 ins; 13 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/23811.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23811/head:pull/23811 PR: https://git.openjdk.org/jdk/pull/23811 From tschatzl at openjdk.org Mon Mar 3 16:55:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 16:55:55 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 15:17:27 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * fix comment (trailing whitespace) >> * another assert when snapshotting at a safepoint. > > src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 857: > >> 855: // We do not expect too many non-Java threads compared to Java threads, so just >> 856: // let one worker claim that work. >> 857: if (!_non_java_threads_claim && !Atomic::cmpxchg(&_non_java_threads_claim, false, true, memory_order_relaxed)) { > > Do non-java threads have card-table-base? This code should not be necessary (any more). Will remove. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977853483 From cnorrbin at openjdk.org Mon Mar 3 16:57:13 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 3 Mar 2025 16:57:13 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v5] In-Reply-To: References: Message-ID: > Hi everyone, > > The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. > > The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. Casper Norrbin has updated the pull request incrementally with two additional commits since the last revision: - removed align_up_or_min test from test_align - psoldgen check + removed align_up_or_min ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23711/files - new: https://git.openjdk.org/jdk/pull/23711/files/3068917b..dd319893 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=03-04 Stats: 22 lines in 3 files changed: 4 ins; 15 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23711.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23711/head:pull/23711 PR: https://git.openjdk.org/jdk/pull/23711 From cnorrbin at openjdk.org Mon Mar 3 16:57:13 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 3 Mar 2025 16:57:13 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v3] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 13:52:06 GMT, Albert Mingkun Yang wrote: >> Changed it to this now > > Thank you; my suggestion was insufficient... Need to have an early-return when `uncommitted_size` is 0 at the beginning of `PSOldGen::expand`. After this, I wonder if `align_up_or_min` is truly warranted. Added the extra check, tests should now work as expected! I agree with your thoughts on `align_up_or_min`. Since it's no longer used, I went ahead and removed it. Now we. only have `align_up_or_null` left for pointer types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1977853877 From tschatzl at openjdk.org Mon Mar 3 18:22:24 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 18:22:24 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v6] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review 2 * removal of useless code * renamings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/810bf2d3..b3dd0084 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=04-05 Stats: 51 lines in 7 files changed: 16 ins; 10 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From duke at openjdk.org Mon Mar 3 19:00:59 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 3 Mar 2025 19:00:59 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v2] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Added comments, removed debugging printfs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23860/files - new: https://git.openjdk.org/jdk/pull/23860/files/1ff58512..fe50e0d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=00-01 Stats: 12 lines in 2 files changed: 9 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From ayang at openjdk.org Mon Mar 3 19:15:53 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 3 Mar 2025 19:15:53 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 16:57:13 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with two additional commits since the last revision: > > - removed align_up_or_min test from test_align > - psoldgen check + removed align_up_or_min Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23711#pullrequestreview-2655030880 From duke at openjdk.org Mon Mar 3 19:24:04 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Mon, 3 Mar 2025 19:24:04 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v3] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 12:46:48 GMT, Thomas Fitzsimmons wrote: >> Update: Please remove the log line, since this is the cg v1 branch and there cpuset isn't optional. > > OK, will do. This represents a change to debug logging on `RHEL-8`, at least in my default test configuration. Currently it is, with and without my patch: > > > $ jdk/bin/java -Xlog:os+container=trace -version > [0.001s][trace][os,container] OSContainer::init: Initializing Container Support > [0.001s][debug][os,container] Detected optional cpuset controller entry in /proc/cgroups > [0.001s][debug][os,container] Detected optional pids controller entry in /proc/cgroups > [0.001s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers > [...] > > However, I agree that the change is a good one, since the debug message was inaccurate when the system was ultimately determined to be in `cgroups v1` mode. Done. I also added similar log messages to the cg v2 branch for pids and cpuset. This corrects debug logging on `Fedora 41`: --- tt-old-f41.txt 2025-03-03 09:27:02.606397900 -0500 +++ tt-new-f41.txt 2025-03-03 09:27:03.287401780 -0500 @@ -1,7 +1,6 @@ [trace][os,container] OSContainer::init: Initializing Container Support -[debug][os,container] Detected optional pids controller entry in /proc/cgroups -[debug][os,container] controller cpuset is not enabled - ] +[debug][os,container] Detected optional cpuset controller entry in /sys/fs/cgroup/cgroup.controllers +[debug][os,container] Detected optional pids controller entry in /sys/fs/cgroup/cgroup.controllers [debug][os,container] Detected cgroups v2 unified hierarchy [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1978056483 From duke at openjdk.org Mon Mar 3 19:24:03 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Mon, 3 Mar 2025 19:24:03 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v3] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 14:29:13 GMT, Severin Gehwolf wrote: >> Thomas Fitzsimmons has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent > > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 42: > >> 40: // Inlined from for portability. >> 41: #define CGROUP2_SUPER_MAGIC 0x63677270 >> 42: > > We may want to surround this with: > > > #ifndef CGROUP2_SUPER_MAGIC > ... > #endif Done. > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 81: > >> 79: bool cgroups_v2_enabled = false; >> 80: >> 81: if (statfs(sys_fs_cgroup, &fsstat) != -1) { > > This probably deserves a comment: > > // Assume cgroups v2 iff /sys/fs/cgroup has the cgroup v2 file system magic. Done. > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 263: > >> 261: char buf[MAXPATHLEN+1]; >> 262: char *p; >> 263: bool is_cgroupsV2 = true; > > For all intents and purposes we can remove `is_cgroupsV2` here and use `cgroups_v2_enabled` instead. OK, done. That was what I tried locally with my first attempt, but I backed off because the `testCgroupv1SystemdOnly` and `testCgroupv1NoMounts` cases failed with hierarchy_id-checking assertion failures. I assumed those tests represented incorrect in-the-wild system configurations that should not hit the debug-mode assertion failures. Now that you have pointed out that those tests were likely using copy-n-pasted/invalid /proc/self/cgroup data, I am happy to replace `is_cgroupsV2` throughout `determine_type`. > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 285: > >> 283: if ((p = fgets(buf, MAXPATHLEN, controllers)) != nullptr) { >> 284: char* controller = nullptr; >> 285: char* buf_ptr = buf; > > Suggestion: > > char* buf_ptr = p; Fixed. > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 287: > >> 285: char* buf_ptr = buf; >> 286: int i; >> 287: while ((controller = strsep(&buf_ptr, " \n\t\r\f\v")) != nullptr) { > > Consider defining the separators as `#define IS_SPACE_CHARS " \n\t\r\f\v"` or some such. Done. > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 288: > >> 286: int i; >> 287: while ((controller = strsep(&buf_ptr, " \n\t\r\f\v")) != nullptr) { >> 288: // Skip empty string due to line ending in delimiter, '\n'. > > Suggestion: > > // Skip empty controllers. Be lean about the cgroups.controllers file, > // though we probably don't have to be. I removed the `strcmp(controller, "") == 0` check since empty controllers will be rare and `cg_v2_controller_index` will return `-1` on an empty string anyway. > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 299: > >> 297: } else { >> 298: log_debug(os, container)("v2 controller %s is enabled but not relevant", controller); >> 299: } > > Do we really need this verbose logging? If you really think we need it, then please bump it to `trace` level. We'd be bailing out anyway if we are missing a required controller with a log. Agreed, we do not need to log the enabled-but-irrelevant ones, removed. I think it still makes sense to log JDK-relevant disabled controllers, and optional enabled controllers, at the debug level. This iteration of the patch fixes the cg v2 branch's logging. For example, here is `Fedora 41` before and after: --- tt-old-f41.txt 2025-03-03 09:44:35.919255848 -0500 +++ tt-new-f41.txt 2025-03-03 09:44:36.224257732 -0500 @@ -1,7 +1,6 @@ [trace][os,container] OSContainer::init: Initializing Container Support -[debug][os,container] Detected optional pids controller entry in /proc/cgroups -[debug][os,container] controller cpuset is not enabled - ] +[debug][os,container] Detected optional cpuset controller entry in /sys/fs/cgroup/cgroup.controllers +[debug][os,container] Detected optional pids controller entry in /sys/fs/cgroup/cgroup.controllers [debug][os,container] Detected cgroups v2 unified hierarchy [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 321: > >> 319: } else { >> 320: /* >> 321: * cgroups v2 is not enabled. Read /proc/cgroups; for cgroups v1 hierarchy (hybrid or > > Suggestion: > > * The /sys/fs/cgroup filesystem magic hint suggests we have cg v1. Read /proc/cgroups; for cgroups v1 hierarchy (hybrid or Done. > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 361: > >> 359: // pids and cpuset controllers are optional. All other controllers are required >> 360: if (i != PIDS_IDX && i != CPUSET_IDX) { >> 361: is_cgroupsV2 = is_cgroupsV2 && cg_infos[i]._hierarchy_id == 0; > > Fundamentally, we are changing the "hint" as to what constitutes cg v2. So this line needs to be removed. We've already determined at this point that we have cgroup v2 (via the magic check) and we need to use that here. https://github.com/jerboaa/jdk/commit/9958173dc03b66ae96227fb3579bc053cb911f06 would do that and fix a test-consistency issue. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1978036635 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1978042082 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1978045429 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1978039650 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1978039029 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1978047932 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1978053794 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1978040638 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1978059988 From duke at openjdk.org Mon Mar 3 19:34:00 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Mon, 3 Mar 2025 19:34:00 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v3] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 12:39:37 GMT, Thomas Fitzsimmons wrote: >> You are right, that if any of the required controllers aren't enabled at the kernel level we fail with `NVALID_CGROUPS_GENERIC`. However, the `if` condition is within the cgroups v1 branch while it used to be outside a version specific branch. >> >> Also note that `/proc/cgroup` containing (last digit `0`, indicating the enabled flag): >> >> >> cpuset 3 1 0\n >> >> >> ... is semantically equivalent to it being missing entirely from `proc/cgroup`. But keeping `if (i != PIDS_IDX && i != CPUSET_IDX) {` above, would keep `all_required_controllers_enabled == true` which is not correct. Yes, we should keep/add a test like you suggest, but amend the patch to something like this: https://github.com/jerboaa/jdk/commit/26f765db9fef6f1d7be79452da701987274117c5 > > Makes sense, will do. I will also simplify the test case as you suggest. I configured my testing `RHEL 8` virtual machine to get a real example of how `cpuset` might be disabled. I created a full test case from it. Previously `determine_type` would return `INVALID_CGROUPS_V1` later in the function. Here is a version of the test case that passes without my patch: https://github.com/fitzsim/jdk/commit/a671c643f16c63c3091868197bee1fbcde81f57d With my patch, and the removal of ` && i != CPUSET_IDX`, `determine_type` returns `INVALID_CGROUPS_GENERIC` earlier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1978075381 From iwalulya at openjdk.org Mon Mar 3 20:18:58 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 3 Mar 2025 20:18:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 08:42:05 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * fix comment (trailing whitespace) > * another assert when snapshotting at a safepoint. src/hotspot/share/gc/g1/g1CardTable.cpp line 44: > 42: if (!failures) { > 43: G1CollectedHeap* g1h = G1CollectedHeap::heap(); > 44: G1HeapRegion* r = g1h->heap_region_containing(mr.start()); Probably we can move this outside the loop, and assert that `mr` does not cross region boundaries src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 916: > 914: void safepoint_synchronize_end() override; > 915: > 916: jlong synchronized_duration() const { return _safepoint_duration; } safepoint_duration() seems easier to comprehend. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 310: > 308: verify_young_cset_indices(); > 309: > 310: size_t card_rs_length = _policy->analytics()->predict_card_rs_length(in_young_only_phase); Why are we using a prediction here? Additionally, won't this prediction also include cards from the old gen regions in case of mixed gcs? How do we reconcile that when we are adding old gen regions to c-set? src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 42: > 40: class G1HeapRegion; > 41: class G1Policy; > 42: class G1CardTableClaimTable; Nit: ordering of the declarations src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 84: > 82: // Tracks the current refinement state from idle to completion (and reset back > 83: // to idle). > 84: class G1ConcurrentRefineWorkState { G1ConcurrentRefinementState? I am not convinced the "Work" adds any clarity src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 113: > 111: // Current epoch the work has been started; used to determine if there has been > 112: // a forced card table swap due to a garbage collection while doing work. > 113: size_t _refine_work_epoch; same as previous comment, why `refine_work` instead of `refinement`? src/hotspot/share/gc/g1/g1ConcurrentRefineStats.hpp line 43: > 41: size_t _cards_clean; // Number of cards found clean. > 42: size_t _cards_not_parsable; // Number of cards we could not parse and left unrefined. > 43: size_t _cards_still_refer_to_cset; // Number of cards marked still young. `_cards_still_refer_to_cset` from the naming it is not clear what the difference is with `_cards_refer_to_cset`, the comment is not helping with that ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977688778 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977969470 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977982999 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977991124 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978017843 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978019093 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978119476 From gziemski at openjdk.org Mon Mar 3 20:22:14 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 3 Mar 2025 20:22:14 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v33] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> On Fri, 28 Feb 2025 13:55:30 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > style, some cleanup, VMT and regionsTree circular dep resolved src/hotspot/share/nmt/regionsTree.cpp line 28: > 26: VMATree::SummaryDiff RegionsTree::commit_region(address addr, size_t size, const NativeCallStack& stack) { > 27: return commit_mapping((VMATree::position)addr, size, make_region_data(stack, mtNone), /*use tag inplace*/ true); > 28: } `RegionsTree::commit_region` is called by ``` static inline void record_virtual_memory_reserve_and_commit(void* addr, size_t size, const NativeCallStack& stack, MemTag mem_tag = mtNone) { which has mem_tag, so we could in theory use it and pass it down? Then we could avoid the complicated "use_tag_inplace" parameter handling? Not sure if this is possible in all cases. Is that why we have the need for "use_tag_inplace"? src/hotspot/share/nmt/regionsTree.cpp line 32: > 30: VMATree::SummaryDiff RegionsTree::uncommit_region(address addr, size_t size) { > 31: return uncommit_mapping((VMATree::position)addr, size, make_region_data(NativeCallStack::empty_stack(), mtNone)); > 32: } Would it be helpful here if we were to add a new tag, that would mark this uncommitted region somehow different than mtNone (to mark it that it used to be used, but now it's not, which is different from never used region)? test/hotspot/gtest/nmt/test_regions_tree.cpp line 72: > 70: EXPECT_EQ(rmr.base(), (address)1400); > 71: rmr = rt.find_reserved_region((address)1005); > 72: EXPECT_EQ(rmr.base(), (address)1000); When I do: rmr = rt.find_reserved_region((address)999); I get back ReservedMemoryRegion with base == 1, I am not 100% sure what I was expecting - probably 0, but not 1. test/hotspot/gtest/nmt/test_regions_tree.cpp line 98: > 96: rt.reserve_mapping(1400, 50, rd); > 97: > 98: rt.commit_region((address)1010, 5UL, ncs); What would, what should happen if we repeat the same reserve_mapping? rt.commit_region((address)1010, 5UL, ncs); rt.commit_region((address)1010, 5UL, ncs); Are/should we be allowed to do this? test/hotspot/gtest/nmt/test_regions_tree.cpp line 102: > 100: rt.commit_region((address)1030, 5UL, ncs); > 101: rt.commit_region((address)1040, 5UL, ncs); > 102: ReservedMemoryRegion rmr((address)1000, 50); I would add something like: rt.commit_region((address)1500, 5UL, ncs); // should not be counted that should not be counted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1977880730 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1977894025 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1978113833 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1978138735 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1978124450 From cslucas at openjdk.org Mon Mar 3 21:09:48 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 3 Mar 2025 21:09:48 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v4] In-Reply-To: References: Message-ID: > In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. > > The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. > > The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. > > Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. > > The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. > > Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. > > Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Fix merge conflict - Address PR feedback: no changes to shared files. - Merge master - Addressing PR comments: some refactorings, ppc fix, off-by-one fix. - Relocation of Card Tables ------------- Changes: https://git.openjdk.org/jdk/pull/23170/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23170&range=03 Stats: 305 lines in 30 files changed: 151 ins; 95 del; 59 mod Patch: https://git.openjdk.org/jdk/pull/23170.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23170/head:pull/23170 PR: https://git.openjdk.org/jdk/pull/23170 From pchilanomate at openjdk.org Mon Mar 3 23:42:56 2025 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 3 Mar 2025 23:42:56 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> On Thu, 27 Feb 2025 15:54:28 GMT, Fredrik Bredberg wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update after review by David and Coleen. Changes look good to me. Just a few comments. Thanks, Patricio src/hotspot/share/runtime/objectMonitor.cpp line 204: > 202: // If the thread (F) that removes itself from the end of the list > 203: // hasn't got any prev pointer, we just set the tail pointer to > 204: // null, see 5) and 6) below. Setting the tail pointer to null would be for the case when this node is also the head, i.e single element. Otherwise we just rebuild the doubly link list, unlink F, and set entry_list_tail to G. In other words, the comment here and below seems to be missing that we have to build the doubly link list when F acquires the monitor, not when F needs to find a successor. src/hotspot/share/runtime/objectMonitor.cpp line 1265: > 1263: // that updated _entry_list, so we can access w->_next. > 1264: w = Atomic::load_acquire(&_entry_list); > 1265: assert(w != nullptr, "invariant"); Maybe add the same assert as below for the single element case: `assert(w->TState == ObjectWaiter::TS_ENTER, "invariant")`. src/hotspot/share/runtime/objectMonitor.cpp line 1359: > 1357: // Build the doubly linked list to get hold of currentNode->prev(). > 1358: _entry_list_tail = nullptr; > 1359: entry_list_tail(current); I think we should try to avoid having to rebuild the doubly link list from scratch, since only a few nodes in the front might be missing the previous links. For platform threads it might not matter that much, but for virtual threads this list could be much larger. Maybe we can leave it as a future enhancement. src/hotspot/share/runtime/objectMonitor.cpp line 1509: > 1507: // is no successor, so it appears that an heir-presumptive > 1508: // (successor) must be made ready. Only the current lock owner can > 1509: // detach threads from the entry_list, therefore we need to We don't detach threads here, so maybe manipulate would be better. src/hotspot/share/runtime/objectMonitor.cpp line 1532: > 1530: // Let's say T1 then stalls. T2 acquires O and calls O.notify(). The > 1531: // notify() operation moves T1 from O's waitset to O's entry_list. T2 then > 1532: // release the lock "O". T2 resumes immediately after the ST of null into Pre-existent, but this should be T1. Same in next sentence. ------------- PR Review: https://git.openjdk.org/jdk/pull/23421#pullrequestreview-2655551088 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1978372164 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1978368315 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1978374081 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1978369547 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1978370888 From sviswanathan at openjdk.org Tue Mar 4 00:02:08 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 4 Mar 2025 00:02:08 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 21:49:42 GMT, Volodymyr Paprotski wrote: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. > > Before (no AVX512) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s > > After (with AVX2) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s > > > Before (with AVX512): > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 102... src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 2: > 1: /* > 2: * Copyright (c) 2025, Intel Corporation. All rights reserved. This should be: Copyright (c) 2024, 2025, Intel Corporation. All rights reserved. Also please check that the copyright year is appropriately updated in all the files. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 259: > 257: } > 258: __ vpaddq(Acc1, Acc1, Carry, Assembler::AVX_256bit); > 259: } A comment here on what this block is doing would help. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1978309062 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1978398909 From fyang at openjdk.org Tue Mar 4 01:57:22 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 4 Mar 2025 01:57:22 GMT Subject: RFR: 8351101: RISC-V: C2: Small improvement to MacroAssembler::revb Message-ID: Hi, please review this small improvement. After logic shift right 56 bits, there is no need to zero extend the remaining 8-bit value. The reason is that the upper bits will be all zeros as this is a logic shift right. Testing: `hotspot:tier1` is clean on linux-riscv64 platform with this change. ------------- Commit messages: - 8351101: RISC-V: C2: Small improvement to MacroAssembler::revb Changes: https://git.openjdk.org/jdk/pull/23879/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23879&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351101 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23879.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23879/head:pull/23879 PR: https://git.openjdk.org/jdk/pull/23879 From cslucas at openjdk.org Tue Mar 4 04:13:33 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 4 Mar 2025 04:13:33 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v5] In-Reply-To: References: Message-ID: > In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. > > The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. > > The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. > > Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. > > The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. > > Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. > > Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. Cesar Soares Lucas has updated the pull request incrementally with two additional commits since the last revision: - Revert changes to shared cardTable.hpp - Revert changes to shared cardTable.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23170/files - new: https://git.openjdk.org/jdk/pull/23170/files/6210f026..717b8b44 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23170&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23170&range=03-04 Stats: 6 lines in 1 file changed: 0 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23170.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23170/head:pull/23170 PR: https://git.openjdk.org/jdk/pull/23170 From cslucas at openjdk.org Tue Mar 4 04:16:03 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 4 Mar 2025 04:16:03 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v3] In-Reply-To: References: <6_AoWQhldJttOIEOL1T7HSapPzE4Qn2j4WN7E-bI3rM=.2685d3d8-e47c-42a6-845b-b68f50cc568e@github.com> Message-ID: On Thu, 20 Feb 2025 15:33:35 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shared/cardTable.hpp line 205: >> >>> 203: virtual CardValue* byte_map_base() const { return _byte_map_base; } >>> 204: >>> 205: virtual CardValue* byte_map() const { return _byte_map; } >> >> @shipilev - can you please confirm that this is the part that you didn't like? > > Yes, I am not fond of extending `CardTable` with virtual members, especially if they can be used on high-performance paths. Not sure if the following idea is viable. > > ShenandoahBarrierSet knows where to get card table base: from Shenandoah thread local data. Now it looks like we need to deal with two problems: > 1. Protect ourselves from accidentally calling `CardTable` methods that may reference "incorrect" `_byte_map_(base)`. To do that, it looks it is enough to initialize `CardTable::_byte_map_(base)` to non-sensical values (`nullptr`-s?), and let the testing crash. > 2. Allow calls to `CardTable` utility methods with our base. For that, I think we can drill a few new (non-virtual) methods in `CardTable`, and enter from Shenandoah through them. So for example `byte_for_index(const size_t card_index)` becomes: > ``` > CardValue* byte_for_index(const CardValue* base, const size_t card_index) const { > return base + card_index; > } > CardValue* byte_for_index(const size_t card_index) const { > return byte_for_index(_byte_map, card_index); > } > ``` @shipilev - can you please take a look at the latest pushes? I realized that the logic implemented already keeps the fields of the base card table class always updated, therefore I don't really need to make the methods (`_byte_map_(base)` virtual at all. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1978578378 From dholmes at openjdk.org Tue Mar 4 04:52:57 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Mar 2025 04:52:57 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> References: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> Message-ID: On Mon, 3 Mar 2025 23:15:46 GMT, Patricio Chilano Mateo wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after review by David and Coleen. > > src/hotspot/share/runtime/objectMonitor.cpp line 204: > >> 202: // If the thread (F) that removes itself from the end of the list >> 203: // hasn't got any prev pointer, we just set the tail pointer to >> 204: // null, see 5) and 6) below. > > Setting the tail pointer to null would be for the case when this node is also the head, i.e single element. Otherwise we just rebuild the doubly link list, unlink F, and set entry_list_tail to G. In other words, the comment here and below seems to be missing that we have to build the doubly link list when F acquires the monitor, not when F needs to find a successor. We don't rebuild at this point. The thread that is removing itself just sets tail to null if there is no prev. Later when F exits the monitor it will construct the DLL to find the next successor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1978608595 From dlong at openjdk.org Tue Mar 4 04:56:24 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 4 Mar 2025 04:56:24 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v5] In-Reply-To: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: > When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. > > In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. > > Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. Dean Long has updated the pull request incrementally with two additional commits since the last revision: - fix typo - moved and hopefully improved invokedynamic comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23557/files - new: https://git.openjdk.org/jdk/pull/23557/files/375f6cfe..80a3235a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23557&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23557&range=03-04 Stats: 7 lines in 2 files changed: 5 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23557.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23557/head:pull/23557 PR: https://git.openjdk.org/jdk/pull/23557 From dlong at openjdk.org Tue Mar 4 04:56:24 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 4 Mar 2025 04:56:24 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v4] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Sat, 1 Mar 2025 22:20:23 GMT, Richard Reingruber wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> use new Bytecode_invoke::has_memeber_arg > > src/hotspot/share/runtime/vframeArray.cpp line 616: > >> 614: // invokedynamic instructions don't have a class but obviously don't have a MemberName appendix. >> 615: // NOTE: Use machinery here that avoids resolving of any kind. >> 616: const bool has_member_arg = inv.has_member_arg(); > > I reckon the comment about invokedynamic isn't needed anymore. It could be moved to has_member_arg if you want to keep it. Good idea. Done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1978611223 From fjiang at openjdk.org Tue Mar 4 06:07:52 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 4 Mar 2025 06:07:52 GMT Subject: RFR: 8351101: RISC-V: C2: Small improvement to MacroAssembler::revb In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 01:28:32 GMT, Fei Yang wrote: > Hi, please review this small improvement. > After logic shift right 56 bits, there is no need to zero extend the remaining 8-bit value. > The reason is that the upper bits will be all zeros as this is a logic shift right. > Testing: `hotspot:tier1` is clean on linux-riscv64 platform with this change. Looks fine. ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/23879#pullrequestreview-2656075025 From rrich at openjdk.org Tue Mar 4 08:20:57 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 4 Mar 2025 08:20:57 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v5] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Tue, 4 Mar 2025 04:56:24 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with two additional commits since the last revision: > > - fix typo > - moved and hopefully improved invokedynamic comment Marked as reviewed by rrich (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23557#pullrequestreview-2656399880 From tschatzl at openjdk.org Tue Mar 4 08:24:54 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 08:24:54 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 20:02:16 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * fix comment (trailing whitespace) >> * another assert when snapshotting at a safepoint. > > src/hotspot/share/gc/g1/g1ConcurrentRefineStats.hpp line 43: > >> 41: size_t _cards_clean; // Number of cards found clean. >> 42: size_t _cards_not_parsable; // Number of cards we could not parse and left unrefined. >> 43: size_t _cards_still_refer_to_cset; // Number of cards marked still young. > > `_cards_still_refer_to_cset` from the naming it is not clear what the difference is with `_cards_refer_to_cset`, the comment is not helping with that `cards_still_refer_to_cset` refers to cards that were found to have already been marked as `to-collection-set`. Renamed to `_cards_already_refer_to_cset`, would that be okay? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978868225 From tschatzl at openjdk.org Tue Mar 4 08:28:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 08:28:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 18:28:48 GMT, Ivan Walulya wrote: > Why are we using a prediction here? Quickly checking again, do we have the actual count here from somewhere? > Additionally, won't this prediction also include cards from the old gen regions in case of mixed gcs? How do we reconcile that when we are adding old gen regions to c-set? The predictor contents changed to (supposedly) only contain cards containing young gen references. See g1Policy.cpp:934: _analytics->report_card_rs_length(total_cards_scanned - total_non_young_rs_cards, is_young_only_pause); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978876199 From tschatzl at openjdk.org Tue Mar 4 08:36:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 08:36:55 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 15:19:20 GMT, Albert Mingkun Yang wrote: > Can you elaborate on what the "special handling" would be, if we don's set "claimed" for non-committed regions? the iteration code, would for every region check whether the region is actually committed or not. The `heap_region_iterate()` API of `G1CollectedHeap` only iterates over committed regions. So only committed regions will be updated in the state table. Later when iterating over the state table, the code uses the array directly, i.e. the claim state of uncommitted regions would be read as uninitialized. Further, it would be hard to exclude regions committed after the snapshot otherwise (we do not need to iterate over them. Their card table can't contain card marks) as we do not track newly committed regions in the snapshot. We could do, but would be a headache due to memory synchronization because regions can be committed any time. Imho it is much simpler to reset all the card claims to "already processed" and then make the regions we want to work on claimable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978893134 From tschatzl at openjdk.org Tue Mar 4 08:39:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 08:39:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:22:03 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentRefineStats.hpp line 43: >> >>> 41: size_t _cards_clean; // Number of cards found clean. >>> 42: size_t _cards_not_parsable; // Number of cards we could not parse and left unrefined. >>> 43: size_t _cards_still_refer_to_cset; // Number of cards marked still young. >> >> `_cards_still_refer_to_cset` from the naming it is not clear what the difference is with `_cards_refer_to_cset`, the comment is not helping with that > > `cards_still_refer_to_cset` refers to cards that were found to have already been marked as `to-collection-set`. Renamed to `_cards_already_refer_to_cset`, would that be okay? Fwiw, this is just for statistics, so if you want I can remove these. I did some experiments with re-examining these cards too to see whether we could clear them later. For determining if/when to do that a rate of increase for the young cards has been interesting. As mentioned, if you want I can remove them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978896272 From tschatzl at openjdk.org Tue Mar 4 08:53:46 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 08:53:46 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v7] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * iwalulya initial comments * renaming * made blend() helper function more clear; at least gcc will optimize it to the same code as before ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/b3dd0084..8f46dc9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=05-06 Stats: 27 lines in 9 files changed: 7 ins; 3 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Tue Mar 4 09:15:24 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 09:15:24 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v8] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * do not change card table base for gc threads during swapping * not necessary because they do not use it * (recent assert that verifies that non-java threads do not have a card table found this) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/8f46dc9a..9e2ee543 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=06-07 Stats: 25 lines in 1 file changed: 9 ins; 14 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From jsjolen at openjdk.org Tue Mar 4 09:28:11 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Mar 2025 09:28:11 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v33] In-Reply-To: <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> Message-ID: On Mon, 3 Mar 2025 20:17:15 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> style, some cleanup, VMT and regionsTree circular dep resolved > > test/hotspot/gtest/nmt/test_regions_tree.cpp line 98: > >> 96: rt.reserve_mapping(1400, 50, rd); >> 97: >> 98: rt.commit_region((address)1010, 5UL, ncs); > > What would, what should happen if we repeat the same reserve_mapping? > > rt.commit_region((address)1010, 5UL, ncs); > rt.commit_region((address)1010, 5UL, ncs); > > Are/should we be allowed to do this? Nothing would happen, the state of the tree would stay the same. We should allow it, at least on product builds! For me, the most important part of this rewrite is getting rid of NMT crashing because it reaches states that it can't do anything with. We can handle more usage patterns, and we try our darnest to not fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1978986884 From azafari at openjdk.org Tue Mar 4 09:28:09 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 4 Mar 2025 09:28:09 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v33] In-Reply-To: <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> Message-ID: On Mon, 3 Mar 2025 17:10:40 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> style, some cleanup, VMT and regionsTree circular dep resolved > > src/hotspot/share/nmt/regionsTree.cpp line 28: > >> 26: VMATree::SummaryDiff RegionsTree::commit_region(address addr, size_t size, const NativeCallStack& stack) { >> 27: return commit_mapping((VMATree::position)addr, size, make_region_data(stack, mtNone), /*use tag inplace*/ true); >> 28: } > > `RegionsTree::commit_region` is called by > > ``` > static inline void record_virtual_memory_reserve_and_commit(void* addr, size_t size, > const NativeCallStack& stack, MemTag mem_tag = mtNone) { > > > which has mem_tag, so we could in theory use it and pass it down? Then we could avoid the complicated "use_tag_inplace" parameter handling? > > Not sure if this is possible in all cases. Is that why we have the need for "use_tag_inplace"? I have to separate the concerns as follow: 1. In `record_..._reserve_and_commit` the `mem_tag` is available since it is needed for `reserve`. So, you are right, we can use the tag and pass it down. The change set would be: `VMT::Instance::add_committed_region(addr, size, stack, MemTag = mtNone)` `VMT::add_committed_region(addr, size, stack, MemTag = mtNone)` `RegionsTree::commit_region(addr, size, stack, MemTag = mtNone)` 3. Even if we do so, `use_tag_in_place` is needed in `commit` because the `os::commit_memmory` family do not pass mem_tag down. There is already an abandoned PR which tried to add MemTag param to this family. It was reviewed as not necessary. 4. In addition, the `VMATree::register_mapping` has a mandatory MemTag param (in its MetaData param) which enforces the caller to pass down an specific MemTag. If we don't use `use_tag_inplace`, for every commit of region $[ base, end)$ we have to find the enclosing reserved region $[ A, B)$ where $A \le base < end \le B$ and get its mem_tag and then pass it down to `VMATree::register_mapping`. Finding the enclosing region could be too expensive and is preferred to be avoided. I can implement the case 1 above if it is preferred. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1978991855 From kbarrett at openjdk.org Tue Mar 4 09:32:56 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 4 Mar 2025 09:32:56 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 10:24:31 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > reverted gcarguments and updated test src/hotspot/share/cds/metaspaceShared.cpp line 244: > 242: > 243: char* aligned_base = align_up_or_null(specified_base, alignment); > 244: assert(is_aligned(aligned_base, alignment), "sanity"); I don't think this assert adds anything. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1975266376 From kbarrett at openjdk.org Tue Mar 4 09:32:55 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 4 Mar 2025 09:32:55 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 16:57:13 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with two additional commits since the last revision: > > - removed align_up_or_min test from test_align > - psoldgen check + removed align_up_or_min Changes requested by kbarrett (Reviewer). src/hotspot/share/utilities/align.hpp line 96: > 94: > 95: template > 96: inline T* align_up_or_null(T* ptr, A alignment) { Rather than specialized align_up variants, I think better and more generally usable would be a predicate for testing whether align_up is safe. Something like a `bool can_align_up(value, alignment)` function. I'm happy to see align_up_or_min removed from this PR for that reason, as well as because it has usability issues. Is a min-valued return indicating failure to align? Or was it just the argument is min-valued and already aligned? Consider passing an unsigned zero value as the value to be aligned. This seems to work: diff --git a/src/hotspot/share/utilities/align.hpp b/src/hotspot/share/utilities/align.hpp index b67e61036a0..d73e8e086ca 100644 --- a/src/hotspot/share/utilities/align.hpp +++ b/src/hotspot/share/utilities/align.hpp @@ -30,6 +30,8 @@ #include "utilities/debug.hpp" #include "utilities/globalDefinitions.hpp" #include "utilities/powerOfTwo.hpp" + +#include #include // Compute mask to use for aligning to or testing alignment. @@ -70,6 +72,17 @@ constexpr T align_down(T size, A alignment) { return result; } +template::value)> +constexpr bool can_align_up(T size, A alignment) { + return align_down(std::numeric_limits::max(), alignment) >= size; +} + +template +inline bool can_align_up(const void* p, A alignment) { + static_assert(sizeof(p) == sizeof(uintptr_t), "assumption"); + return can_align_up(reinterpret_cast(p), alignment); +} + template::value)> constexpr T align_up(T size, A alignment) { T adjusted = checked_cast(size + alignment_mask(alignment)); diff --git a/test/hotspot/gtest/utilities/test_align.cpp b/test/hotspot/gtest/utilities/test_align.cpp index 3c03fd5f24d..2112d43feab 100644 --- a/test/hotspot/gtest/utilities/test_align.cpp +++ b/test/hotspot/gtest/utilities/test_align.cpp @@ -27,6 +27,51 @@ #include "unittest.hpp" #include +#include + +template +static constexpr void test_can_align_up() { + int alignment_value = 4; + int small_value = 63; + A alignment = static_cast(alignment_value); + + EXPECT_TRUE(can_align_up(static_cast(small_value), alignment)); + EXPECT_TRUE(can_align_up(static_cast(-small_value), alignment)); + EXPECT_TRUE(can_align_up(std::numeric_limits::min(), alignment)); + EXPECT_FALSE(can_align_up(std::numeric_limits::max(), alignment)); + EXPECT_FALSE(can_align_up(std::numeric_limits::max() - 1, alignment)); + EXPECT_TRUE(can_align_up(align_down(std::numeric_limits::max(), alignment), alignment)); + EXPECT_FALSE(can_align_up(align_down(std::numeric_limits::max(), alignment) + 1, alignment)); + if (std::is_signed::value) { + EXPECT_TRUE(can_align_up(static_cast(-1), alignment)); + EXPECT_TRUE(can_align_up(align_down(static_cast(-1), alignment), alignment)); + EXPECT_TRUE(can_align_up(align_down(static_cast(-1) + 1, alignment), alignment)); + } +} + +TEST(Align, test_can_align_up_int32_int32) { + test_can_align_up(); +} + +TEST(Align, test_can_align_up_uint32_uint32) { + test_can_align_up(); +} + +TEST(Align, test_can_align_up_int32_uint32) { + test_can_align_up(); +} + +TEST(Align, test_can_align_up_uint32_int32) { + test_can_align_up(); +} + +TEST(Align, test_can_align_up_ptr) { + uint alignment = 4; + char buffer[8]; + + EXPECT_TRUE(can_align_up(buffer, alignment)); + EXPECT_FALSE(can_align_up(reinterpret_cast(UINTPTR_MAX), alignment)); +} // A few arbitrarily chosen values to test the align functions on. static constexpr uint64_t values[] = {1, 3, 10, 345, 1023, 1024, 1025, 23909034, INT_MAX, uint64_t(-1) / 2, uint64_t(-1) / 2 + 100, uint64_t(-1)}; `align_up` can assert `can_align_up(size, alignment)`. ------------- PR Review: https://git.openjdk.org/jdk/pull/23711#pullrequestreview-2650497091 PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1978995585 From kbarrett at openjdk.org Tue Mar 4 09:32:57 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 4 Mar 2025 09:32:57 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v5] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:27:22 GMT, Kim Barrett wrote: >> Casper Norrbin has updated the pull request incrementally with two additional commits since the last revision: >> >> - removed align_up_or_min test from test_align >> - psoldgen check + removed align_up_or_min > > src/hotspot/share/utilities/align.hpp line 96: > >> 94: >> 95: template >> 96: inline T* align_up_or_null(T* ptr, A alignment) { > > Rather than specialized align_up variants, I think better and more generally > usable would be a predicate for testing whether align_up is safe. Something > like a `bool can_align_up(value, alignment)` function. > > I'm happy to see align_up_or_min removed from this PR for that reason, as well > as because it has usability issues. Is a min-valued return indicating failure > to align? Or was it just the argument is min-valued and already aligned? > Consider passing an unsigned zero value as the value to be aligned. > > This seems to work: > > diff --git a/src/hotspot/share/utilities/align.hpp b/src/hotspot/share/utilities/align.hpp > index b67e61036a0..d73e8e086ca 100644 > --- a/src/hotspot/share/utilities/align.hpp > +++ b/src/hotspot/share/utilities/align.hpp > @@ -30,6 +30,8 @@ > #include "utilities/debug.hpp" > #include "utilities/globalDefinitions.hpp" > #include "utilities/powerOfTwo.hpp" > + > +#include > #include > > // Compute mask to use for aligning to or testing alignment. > @@ -70,6 +72,17 @@ constexpr T align_down(T size, A alignment) { > return result; > } > > +template::value)> > +constexpr bool can_align_up(T size, A alignment) { > + return align_down(std::numeric_limits::max(), alignment) >= size; > +} > + > +template > +inline bool can_align_up(const void* p, A alignment) { > + static_assert(sizeof(p) == sizeof(uintptr_t), "assumption"); > + return can_align_up(reinterpret_cast(p), alignment); > +} > + > template::value)> > constexpr T align_up(T size, A alignment) { > T adjusted = checked_cast(size + alignment_mask(alignment)); > diff --git a/test/hotspot/gtest/utilities/test_align.cpp b/test/hotspot/gtest/utilities/test_align.cpp > index 3c03fd5f24d..2112d43feab 100644 > --- a/test/hotspot/gtest/utilities/test_align.cpp > +++ b/test/hotspot/gtest/utilities/test_align.cpp > @@ -27,6 +27,51 @@ > #include "unittest.hpp" > > #include > +#include > + > +template > +static constexpr void test_can_align_up() { > + int alignment_value = 4; > + int small_value = 63; > + A alignment = static_cast(alignment_value); > + > + EXPECT_TRUE(can_align_up(static_cast(small_value), alignment)); > + EXPECT_TRUE(can_align_up(static_cast(-small_value), alignment)); > + EXPECT_TRU... I forgot to add any description comments for can_align_up. And align_up description should say can_align_up is a precondition. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1978999388 From kbarrett at openjdk.org Tue Mar 4 09:36:56 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 4 Mar 2025 09:36:56 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v2] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 11:37:12 GMT, Kim Barrett wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> reverted gcarguments and updated test > > src/hotspot/share/cds/metaspaceShared.cpp line 244: > >> 242: >> 243: char* aligned_base = align_up_or_null(specified_base, alignment); >> 244: assert(is_aligned(aligned_base, alignment), "sanity"); > > I don't think this assert adds anything. Actually, it's worse than that, since is_aligned of a null pointer is problematic. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1979006881 From iwalulya at openjdk.org Tue Mar 4 09:38:58 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 4 Mar 2025 09:38:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:36:58 GMT, Thomas Schatzl wrote: >> `cards_still_refer_to_cset` refers to cards that were found to have already been marked as `to-collection-set`. Renamed to `_cards_already_refer_to_cset`, would that be okay? > > Fwiw, this particular counter is just for statistics, so if you want I can remove these. I did some experiments with re-examining these cards too to see whether we could clear them later. For determining if/when to do that a rate of increase for the young cards has been interesting. > > As mentioned, if you want I can remove them. `_cards_already_refer_to_cset` is fine by me, i don't like the option of removing them ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979009507 From iwalulya at openjdk.org Tue Mar 4 09:43:54 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 4 Mar 2025 09:43:54 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:26:10 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1CollectionSet.cpp line 310: >> >>> 308: verify_young_cset_indices(); >>> 309: >>> 310: size_t card_rs_length = _policy->analytics()->predict_card_rs_length(in_young_only_phase); >> >> Why are we using a prediction here? Additionally, won't this prediction also include cards from the old gen regions in case of mixed gcs? How do we reconcile that when we are adding old gen regions to c-set? > >> Why are we using a prediction here? > > Quickly checking again, do we have the actual count here from somewhere? > >> Additionally, won't this prediction also include cards from the old gen regions in case of mixed gcs? How do we reconcile that when we are adding old gen regions to c-set? > > The predictor contents changed to (supposedly) only contain cards containing young gen references. See g1Policy.cpp:934: > > _analytics->report_card_rs_length(total_cards_scanned - total_non_young_rs_cards, is_young_only_pause); Fair, I missed that details on young RS have been removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979022900 From azafari at openjdk.org Tue Mar 4 09:48:09 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 4 Mar 2025 09:48:09 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v33] In-Reply-To: <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> Message-ID: On Mon, 3 Mar 2025 17:20:01 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> style, some cleanup, VMT and regionsTree circular dep resolved > > src/hotspot/share/nmt/regionsTree.cpp line 32: > >> 30: VMATree::SummaryDiff RegionsTree::uncommit_region(address addr, size_t size) { >> 31: return uncommit_mapping((VMATree::position)addr, size, make_region_data(NativeCallStack::empty_stack(), mtNone)); >> 32: } > > Would it be helpful here if we were to add a new tag, that would mark this uncommitted region somehow different than mtNone (to mark it that it used to be used, but now it's not, which is different from never used region)? An uncommitted region has implicitly the same tag as its containing reserved region: <-------------Reserved Region, mtXXX----------> <----C1----><..U1...><---C2--><..U2..><---C3--> C1-C3 are committed regions with tag `mtXXX` U1 and U2 are uncommitted and enclosed by a `mtXXX` tag region. Do you have any specific use-case for this? --- If you were thinking about *release_mapping* instead, then I can say that the released region will be sooner or later reserved by another memtag. For example, an mtStack region is released and immediately reserved by mtGC. In other words, any memtag on a released region would be overwritten by other uses of the same region. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1979035962 From tschatzl at openjdk.org Tue Mar 4 09:57:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 09:57:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 18:50:37 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * fix comment (trailing whitespace) >> * another assert when snapshotting at a safepoint. > > src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 84: > >> 82: // Tracks the current refinement state from idle to completion (and reset back >> 83: // to idle). >> 84: class G1ConcurrentRefineWorkState { > > G1ConcurrentRefinementState? I am not convinced the "Work" adds any clarity We agreed on `G1ConcurrentRefineSweepState` for now, better suggestions welcome. Use `Refine` instead of `Refinement` since all pre-existing classes also use `Refine`. This could be renamed in an extra change. > src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 113: > >> 111: // Current epoch the work has been started; used to determine if there has been >> 112: // a forced card table swap due to a garbage collection while doing work. >> 113: size_t _refine_work_epoch; > > same as previous comment, why `refine_work` instead of `refinement`? Already renamed, same as previous comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979050867 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979051649 From tschatzl at openjdk.org Tue Mar 4 09:57:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 09:57:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v9] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * iwalulya review 2 * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState * some additional documentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/9e2ee543..442d9eae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=07-08 Stats: 93 lines in 7 files changed: 27 ins; 3 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Tue Mar 4 09:57:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 09:57:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: <3BAl6ELdTMEhWoovthkw7lq86mwuoUnyKxzCANFnwNc=.41077bf4-8073-4810-9d0d-078d7ad06240@github.com> On Tue, 4 Mar 2025 09:52:40 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 84: >> >>> 82: // Tracks the current refinement state from idle to completion (and reset back >>> 83: // to idle). >>> 84: class G1ConcurrentRefineWorkState { >> >> G1ConcurrentRefinementState? I am not convinced the "Work" adds any clarity > > We agreed on `G1ConcurrentRefineSweepState` for now, better suggestions welcome. > > Use `Refine` instead of `Refinement` since all pre-existing classes also use `Refine`. This could be renamed in an extra change. Add the `Sweep` in the name because this is not the state for entire refinement (which also includes information about when to start refinement/sweeping). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979053344 From azafari at openjdk.org Tue Mar 4 10:04:09 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 4 Mar 2025 10:04:09 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v33] In-Reply-To: <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> Message-ID: <8bygUpwKnmx05leCwTfcQsQkdlYljaDWbpr8lBL7deg=.0a098195-d497-4224-b1d2-1fb3be7071ed@github.com> On Mon, 3 Mar 2025 19:58:16 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> style, some cleanup, VMT and regionsTree circular dep resolved > > test/hotspot/gtest/nmt/test_regions_tree.cpp line 72: > >> 70: EXPECT_EQ(rmr.base(), (address)1400); >> 71: rmr = rt.find_reserved_region((address)1005); >> 72: EXPECT_EQ(rmr.base(), (address)1000); > > When I do: > > rmr = rt.find_reserved_region((address)999); > > I get back ReservedMemoryRegion with base == 1, I am not 100% sure what I was expecting - probably 0, but not 1. Here, the 999 address is not in any region. If no region is found by the `rt.find_reserved_region(addr)`, a region with base==1 and size==1 is returned. base == 0 triggers some assertions since it is a pointer. Because of this, I added the `ReservedMemoryRegion::is_valid()` which checks base and size. rmr = rt.find_reserved_region(999); if (!rmr.is_valid()) { // your code .... } P.S.: The `find_reserved_region` is expensive and hopefully it would be removed in future PRs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1979067689 From azafari at openjdk.org Tue Mar 4 10:11:09 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 4 Mar 2025 10:11:09 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v33] In-Reply-To: <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> Message-ID: On Mon, 3 Mar 2025 20:06:20 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> style, some cleanup, VMT and regionsTree circular dep resolved > > test/hotspot/gtest/nmt/test_regions_tree.cpp line 102: > >> 100: rt.commit_region((address)1030, 5UL, ncs); >> 101: rt.commit_region((address)1040, 5UL, ncs); >> 102: ReservedMemoryRegion rmr((address)1000, 50); > > I would add something like: > > rt.commit_region((address)1500, 5UL, ncs); // should not be counted > > that should not be counted. adding that line crashes for me as follows: Internal Error (/src/hotspot/share/nmt/vmatree.cpp:77), pid=87926, tid=87926 # assert(leqA_n->val().out.type() != StateType::Released) failed: Should not use inplace the tag of a released region # It means that we cannot commit without reserving first. The region being committed is released but should be reserved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1979080504 From azafari at openjdk.org Tue Mar 4 10:18:12 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 4 Mar 2025 10:18:12 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v33] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> Message-ID: On Tue, 4 Mar 2025 09:23:08 GMT, Johan Sj?len wrote: >> test/hotspot/gtest/nmt/test_regions_tree.cpp line 98: >> >>> 96: rt.reserve_mapping(1400, 50, rd); >>> 97: >>> 98: rt.commit_region((address)1010, 5UL, ncs); >> >> What would, what should happen if we repeat the same reserve_mapping? >> >> rt.commit_region((address)1010, 5UL, ncs); >> rt.commit_region((address)1010, 5UL, ncs); >> >> Are/should we be allowed to do this? > > Nothing would happen, the state of the tree would stay the same. We should allow it, at least on product builds! For me, the most important part of this rewrite is getting rid of NMT crashing because it reaches states that it can't do anything with. We can handle more usage patterns, and we try our darnest to not fail. The test `TEST_VM_F(NMTRegionsTreeTest, ReserveCommitTwice)` is added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1979094540 From mli at openjdk.org Tue Mar 4 10:19:59 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 4 Mar 2025 10:19:59 GMT Subject: RFR: 8351101: RISC-V: C2: Small improvement to MacroAssembler::revb In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 01:28:32 GMT, Fei Yang wrote: > Hi, please review this small improvement. > After logic shift right 56 bits, there is no need to zero extend the remaining 8-bit value. > The reason is that the upper bits will be all zeros as this is a logic shift right. > Testing: `hotspot:tier1` is clean on linux-riscv64 platform with this change. Looks good. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23879#pullrequestreview-2656812931 From mdoerr at openjdk.org Tue Mar 4 10:40:55 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 4 Mar 2025 10:40:55 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v9] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:57:56 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * iwalulya review 2 > * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState > * some additional documentation I got an error while testing java/foreign/TestUpcallStress.java on linuxaarch64 with this PR: # Internal Error (/openjdk-jdk-linux_aarch64-dbg/jdk/src/hotspot/share/gc/g1/g1CardTable.cpp:56), pid=19044, tid=19159 # guarantee(!failures) failed: there should not have been any failures ... V [libjvm.so+0xb6e988] G1CardTable::verify_region(MemRegion, unsigned char, bool)+0x3b8 (g1CardTable.cpp:56) V [libjvm.so+0xc3a10c] G1MergeHeapRootsTask::G1ClearBitmapClosure::do_heap_region(G1HeapRegion*)+0x13c (g1RemSet.cpp:1048) V [libjvm.so+0xb7a80c] G1CollectedHeap::par_iterate_regions_array(G1HeapRegionClosure*, G1HeapRegionClaimer*, unsigned int const*, unsigned long, unsigned int) const+0x9c (g1CollectedHeap.cpp:2059) V [libjvm.so+0xc49fe8] G1MergeHeapRootsTask::work(unsigned int)+0x708 (g1RemSet.cpp:1225) V [libjvm.so+0x19597bc] WorkerThread::run()+0x98 (workerThread.cpp:69) V [libjvm.so+0x1824510] Thread::call_run()+0xac (thread.cpp:231) V [libjvm.so+0x13b3994] thread_native_entry(Thread*)+0x130 (os_linux.cpp:877) C [libpthread.so.0+0x875c] start_thread+0x18c ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2697024679 From tschatzl at openjdk.org Tue Mar 4 10:48:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 10:48:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v9] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 10:37:47 GMT, Martin Doerr wrote: > I got an error while testing java/foreign/TestUpcallStress.java on linuxaarch64 with this PR: > > ``` > # Internal Error (/openjdk-jdk-linux_aarch64-dbg/jdk/src/hotspot/share/gc/g1/g1CardTable.cpp:56), pid=19044, tid=19159 > # guarantee(!failures) failed: there should not have been any failures > ... > V [libjvm.so+0xb6e988] G1CardTable::verify_region(MemRegion, unsigned char, bool)+0x3b8 (g1CardTable.cpp:56) > V [libjvm.so+0xc3a10c] G1MergeHeapRootsTask::G1ClearBitmapClosure::do_heap_region(G1HeapRegion*)+0x13c (g1RemSet.cpp:1048) > V [libjvm.so+0xb7a80c] G1CollectedHeap::par_iterate_regions_array(G1HeapRegionClosure*, G1HeapRegionClaimer*, unsigned int const*, unsigned long, unsigned int) const+0x9c (g1CollectedHeap.cpp:2059) > V [libjvm.so+0xc49fe8] G1MergeHeapRootsTask::work(unsigned int)+0x708 (g1RemSet.cpp:1225) > V [libjvm.so+0x19597bc] WorkerThread::run()+0x98 (workerThread.cpp:69) > V [libjvm.so+0x1824510] Thread::call_run()+0xac (thread.cpp:231) > V [libjvm.so+0x13b3994] thread_native_entry(Thread*)+0x130 (os_linux.cpp:877) > C [libpthread.so.0+0x875c] start_thread+0x18c > ``` I will try to reproduce. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2697052899 From tschatzl at openjdk.org Tue Mar 4 10:53:46 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 10:53:46 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v10] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * ayang review - fix comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/442d9eae..fc674f02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From duke at openjdk.org Tue Mar 4 11:14:01 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 4 Mar 2025 11:14:01 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v7] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 09:53:21 GMT, Andrew Dinn wrote: > Oops. sorry - cut and paste error -- the new setting should be > > ``` > do_arch_blob(compiler, 55000 ZGC_ONLY(+5000)) > ``` @adinn, I have done this change, but that erased your approval. Could you reapprove? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2697145316 From iwalulya at openjdk.org Tue Mar 4 11:19:59 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 4 Mar 2025 11:19:59 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v9] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:57:56 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * iwalulya review 2 > * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState > * some additional documentation src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 108: > 106: > 107: void G1ConcurrentRefineThreadControl::control_thread_do(ThreadClosure* tc) { > 108: if (_control_thread != nullptr) { maybe maintain using `if (max_num_threads() > 0)` as used in `G1ConcurrentRefineThreadControl::initialize`, so that it is clear that setting `G1ConcRefinementThreads=0` effectively turns off concurrent refinement. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 354: > 352: if (!r->is_free()) { > 353: // Need to scan all parts of non-free regions, so reset the claim. > 354: // No need for synchronization: we are only interested about regions s/about/in src/hotspot/share/gc/g1/g1OopClosures.hpp line 205: > 203: G1CollectedHeap* _g1h; > 204: uint _worker_id; > 205: bool _has_to_cset_ref; Similar to `_cards_refer_to_cset` , do you mind renaming `_has_to_cset_ref` and `_has_to_old_ref` to `_has_ref_to_cset` and `_has_ref_to_old` src/hotspot/share/gc/g1/g1Policy.hpp line 105: > 103: uint _free_regions_at_end_of_collection; > 104: > 105: size_t _pending_cards_from_gc; A comment on the variable would be nice, especially on how it is set/reset both at end of GC and by refinement. Also the `_to_collection_set_cards` below could use a comment ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979077904 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979102189 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979212854 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979155941 From azafari at openjdk.org Tue Mar 4 11:20:05 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 4 Mar 2025 11:20:05 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v34] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: test cases for doing reserve or commit the same region twice. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/70209581..fc106d5f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=32-33 Stats: 24 lines in 1 file changed: 24 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From adinn at openjdk.org Tue Mar 4 11:21:00 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 4 Mar 2025 11:21:00 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v8] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 06:22:09 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merged master. > - Added more comments, mainly as suggested by Andrew Dinn > - Changed aarch64-asmtest.py as suggested by Bhavana-Kilambi > - Accepting suggested change from Andrew Dinn > - Added comments suggested by Andrew Dinn > - Fixed copyright years > - renaming a couple of functions > - Adding comments + some code reorganization > - removed debugging code > - merging master > - ... and 3 more: https://git.openjdk.org/jdk/compare/ab4b0ef9...d82dfb2f Still good. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23300#pullrequestreview-2657047714 From qpzhang at openjdk.org Tue Mar 4 11:37:40 2025 From: qpzhang at openjdk.org (Patrick Zhang) Date: Tue, 4 Mar 2025 11:37:40 GMT Subject: RFR: 8350663: AArch64: Enable UseSignumIntrinsic by default Message-ID: According to tests on Arm CPUs Neoverse-N1/N2/V1/V2 and Ampere-Altra/AmpereOne, `-XX:+UseSignumIntrinsic` can provide consistent positive performance boost on singnum microbenchmarks (1-4,5,7 in below list) and no obvious regression (ops/s change <0.1%) on other relevant tests (6,9-12). In addition, "_Apple M1 shows no regression with signum intrinsics_" (verified by @theRealAph). So, it can be the time to enable this UseSignumIntrinsic flag by default for aarch64-port. By the way, x86 and riscv ports have already configured it on by default. Tests: passed JTReg tier1 tests on Ampere-1A, no regression found, and particularly checked test results of two signum cases (13,14), both are in good state. 1. org.openjdk.bench.java.lang.MathBench.signumDouble 2. org.openjdk.bench.java.lang.MathBench.signumFloat 3. org.openjdk.bench.java.lang.StrictMathBench.sigNumDouble 4. org.openjdk.bench.java.lang.StrictMathBench.signumFloat 5. org.openjdk.bench.vm.compiler.Signum._1_signumFloatTest 6. org.openjdk.bench.vm.compiler.Signum._2_overheadFloat 7. org.openjdk.bench.vm.compiler.Signum._3_signumDoubleTest 8. org.openjdk.bench.vm.compiler.Signum._4_overheadDouble 9. org.openjdk.bench.vm.compiler.Signum._5_copySignFloatTest 10. org.openjdk.bench.vm.compiler.Signum._6_overheadCopySignFloat 11. org.openjdk.bench.vm.compiler.Signum._7_copySignDoubleTest 12. org.openjdk.bench.vm.compiler.Signum._8_overheadCopySignDouble 13. JTReg: compiler/vectorization/TestSignumVector.java 14. JTReg: compiler/intrinsics/math/TestSignumIntrinsic.java ------------- Commit messages: - 8350663: AArch64: Enable UseSignumIntrinsic by default Changes: https://git.openjdk.org/jdk/pull/23893/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23893&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350663 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23893.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23893/head:pull/23893 PR: https://git.openjdk.org/jdk/pull/23893 From tschatzl at openjdk.org Tue Mar 4 11:39:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 11:39:55 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v9] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 10:06:37 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * iwalulya review 2 >> * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState >> * some additional documentation > > src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 108: > >> 106: >> 107: void G1ConcurrentRefineThreadControl::control_thread_do(ThreadClosure* tc) { >> 108: if (_control_thread != nullptr) { > > maybe maintain using `if (max_num_threads() > 0)` as used in `G1ConcurrentRefineThreadControl::initialize`, so that it is clear that setting `G1ConcRefinementThreads=0` effectively turns off concurrent refinement. I added a new `is_refinement_enabled()` predicate instead (that uses `max_num_threads()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979252156 From shade at openjdk.org Tue Mar 4 11:51:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 4 Mar 2025 11:51:07 GMT Subject: RFR: 8345169: Implement JEP XXX: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 08:26:10 GMT, Aleksey Shipilev wrote: > **NOTE: This is work-in-progress draft for interested parties. The JEP is not even submitted, let alone targeted.** > > My plan is to to get this done in a quiet time in mainline to limit the ongoing conflicts with mainline. Feel free to comment in this PR, if you see something ahead of time. These comments might adjust the trajectory we take to implement this removal and/or allows us submit and work out more RFEs ahead of this removal. I plan to re-open a clean PR after this preliminary PR is done, maybe after the round of preliminary reviews. > > This removes the 32-bit x86 port and does a deeper cleaning in Hotspot. The following paragraphs describe what and why was being done. > > Easy stuff first: all files named `*_x86_32` are gone. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. > > The code under `!LP64`, `!AMD64` and `IA32` is removed in `x86`-specific files. There is quite a bit of the code, especially around `Assembler` and `MacroAssembler`. I think these removals make the whole thing cleaner. The downside is that some of the `MacroAssembler::*ptr` functions that were used to select the "machine pointer" instructions either from x86_64 or x86_32 are now exclusively for x86_64. I don't think we want to rewrite `*ptr` -> `*q` at this point. I think we gradually morph the code base to use `*q`-flavored methods in new code. > > x86_32 is the only platform that has special cases for x87 FPU. > > C1 even implements the whole separate thing to deal with x87 FPU: the parts of regalloc treat it specially, there is `FpuStackSim`, there is `VerifyFPU` family of flags, etc. There are also peculiarities with FP conversions that use FPU, that's why x86_32 used to have template interpreter stubs for FP conversion methods. None of that is needed anymore without x86_32. This cleans up some arch-specific code as well. > > Both C1 and C2 implement the workarounds for non-IEEE compliant rounding of x87 FPU. After x86_32 is gone, these are not needed anymore. This removes some C2 nodes, removes the rounding instructions in C1. > > x86_64 is baselined on SSE2+, the VM would not even start if SSE2 is not supported. Most of the checks that we have for `UseSSE < 2` are for the benefit of x86_32. Because of this I folded redundant `UseSSE` checks around Hotspot. > > The one thing I _deliberately_ avoided doing is merging `x86.ad` and `x86_64.ad`. It would likely introduce uncomfortable amount of conflicts with pending work in mainli... Great, thanks for the feedback. I think we are going to go with the JEP implementation that removes the easy parts of x86_32 code, and then do the deeper cleanups under [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella. I added some subtasks there, based on the commits from this bulk PR. I am closing this PR in favor of about-to-be-created cleaner PR for JEP 503. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22567#issuecomment-2697266596 From shade at openjdk.org Tue Mar 4 11:51:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 4 Mar 2025 11:51:07 GMT Subject: Withdrawn: 8345169: Implement JEP XXX: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 08:26:10 GMT, Aleksey Shipilev wrote: > **NOTE: This is work-in-progress draft for interested parties. The JEP is not even submitted, let alone targeted.** > > My plan is to to get this done in a quiet time in mainline to limit the ongoing conflicts with mainline. Feel free to comment in this PR, if you see something ahead of time. These comments might adjust the trajectory we take to implement this removal and/or allows us submit and work out more RFEs ahead of this removal. I plan to re-open a clean PR after this preliminary PR is done, maybe after the round of preliminary reviews. > > This removes the 32-bit x86 port and does a deeper cleaning in Hotspot. The following paragraphs describe what and why was being done. > > Easy stuff first: all files named `*_x86_32` are gone. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. > > The code under `!LP64`, `!AMD64` and `IA32` is removed in `x86`-specific files. There is quite a bit of the code, especially around `Assembler` and `MacroAssembler`. I think these removals make the whole thing cleaner. The downside is that some of the `MacroAssembler::*ptr` functions that were used to select the "machine pointer" instructions either from x86_64 or x86_32 are now exclusively for x86_64. I don't think we want to rewrite `*ptr` -> `*q` at this point. I think we gradually morph the code base to use `*q`-flavored methods in new code. > > x86_32 is the only platform that has special cases for x87 FPU. > > C1 even implements the whole separate thing to deal with x87 FPU: the parts of regalloc treat it specially, there is `FpuStackSim`, there is `VerifyFPU` family of flags, etc. There are also peculiarities with FP conversions that use FPU, that's why x86_32 used to have template interpreter stubs for FP conversion methods. None of that is needed anymore without x86_32. This cleans up some arch-specific code as well. > > Both C1 and C2 implement the workarounds for non-IEEE compliant rounding of x87 FPU. After x86_32 is gone, these are not needed anymore. This removes some C2 nodes, removes the rounding instructions in C1. > > x86_64 is baselined on SSE2+, the VM would not even start if SSE2 is not supported. Most of the checks that we have for `UseSSE < 2` are for the benefit of x86_32. Because of this I folded redundant `UseSSE` checks around Hotspot. > > The one thing I _deliberately_ avoided doing is merging `x86.ad` and `x86_64.ad`. It would likely introduce uncomfortable amount of conflicts with pending work in mainli... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22567 From tschatzl at openjdk.org Tue Mar 4 11:56:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 11:56:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: iwalulya review * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement * predicate for determining whether the refinement has been disabled * some other typos/comment improvements * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/fc674f02..b4d19d9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=09-10 Stats: 40 lines in 8 files changed: 14 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From coleenp at openjdk.org Tue Mar 4 13:30:04 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 4 Mar 2025 13:30:04 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> References: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> Message-ID: On Mon, 3 Mar 2025 23:18:13 GMT, Patricio Chilano Mateo wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after review by David and Coleen. > > src/hotspot/share/runtime/objectMonitor.cpp line 1359: > >> 1357: // Build the doubly linked list to get hold of currentNode->prev(). >> 1358: _entry_list_tail = nullptr; >> 1359: entry_list_tail(current); > > I think we should try to avoid having to rebuild the doubly link list from scratch, since only a few nodes in the front might be missing the previous links. For platform threads it might not matter that much, but for virtual threads this list could be much larger. Maybe we can leave it as a future enhancement. We don't have a prev node, we don't know which node to set next to our next node to. The list will be broken. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1979432912 From adinn at openjdk.org Tue Mar 4 14:04:03 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 4 Mar 2025 14:04:03 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v7] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 11:11:44 GMT, Ferenc Rakoczi wrote: >> Oops. sorry - cut and paste error -- the new setting should be >> >> do_arch_blob(compiler, 55000 ZGC_ONLY(+5000)) > >> Oops. sorry - cut and paste error -- the new setting should be >> >> ``` >> do_arch_blob(compiler, 55000 ZGC_ONLY(+5000)) >> ``` > > @adinn, I have done this change, but that erased your approval. Could you reapprove? @ferakocz Feel free to integrate and I will sponsor ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2697719261 From duke at openjdk.org Tue Mar 4 14:13:05 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 4 Mar 2025 14:13:05 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v7] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 11:11:44 GMT, Ferenc Rakoczi wrote: >> Oops. sorry - cut and paste error -- the new setting should be >> >> do_arch_blob(compiler, 55000 ZGC_ONLY(+5000)) > >> Oops. sorry - cut and paste error -- the new setting should be >> >> ``` >> do_arch_blob(compiler, 55000 ZGC_ONLY(+5000)) >> ``` > > @adinn, I have done this change, but that erased your approval. Could you reapprove? > @ferakocz Feel free to integrate and I will sponsor @adinn thanks a lot for the review and the sponsoring, too! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2697761033 From duke at openjdk.org Tue Mar 4 14:13:05 2025 From: duke at openjdk.org (duke) Date: Tue, 4 Mar 2025 14:13:05 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v8] In-Reply-To: References: Message-ID: <4goExO2NlWn1wVnu0eYddpXAN4h_t9F7VG4b-MHI_sE=.74de8ba0-eec5-401e-9aa5-6bda6a4e74a5@github.com> On Fri, 28 Feb 2025 06:22:09 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merged master. > - Added more comments, mainly as suggested by Andrew Dinn > - Changed aarch64-asmtest.py as suggested by Bhavana-Kilambi > - Accepting suggested change from Andrew Dinn > - Added comments suggested by Andrew Dinn > - Fixed copyright years > - renaming a couple of functions > - Adding comments + some code reorganization > - removed debugging code > - merging master > - ... and 3 more: https://git.openjdk.org/jdk/compare/ab4b0ef9...d82dfb2f @ferakocz Your change (at version d82dfb2f6d329f4caa0949bfbcd5dd5e5d52d6e9) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2697751091 From mullan at openjdk.org Tue Mar 4 14:36:04 2025 From: mullan at openjdk.org (Sean Mullan) Date: Tue, 4 Mar 2025 14:36:04 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v8] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 06:22:09 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merged master. > - Added more comments, mainly as suggested by Andrew Dinn > - Changed aarch64-asmtest.py as suggested by Bhavana-Kilambi > - Accepting suggested change from Andrew Dinn > - Added comments suggested by Andrew Dinn > - Fixed copyright years > - renaming a couple of functions > - Adding comments + some code reorganization > - removed debugging code > - merging master > - ... and 3 more: https://git.openjdk.org/jdk/compare/ab4b0ef9...d82dfb2f I think it would be nice to add a release note for this describing the approximate performance improvement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2697841749 From duke at openjdk.org Tue Mar 4 14:44:00 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 4 Mar 2025 14:44:00 GMT Subject: Integrated: 8348561: Add aarch64 intrinsics for ML-DSA In-Reply-To: References: Message-ID: On Fri, 24 Jan 2025 14:24:23 GMT, Ferenc Rakoczi wrote: > By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. This pull request has now been integrated. Changeset: 3230894b Author: Ferenc Rakoczi Committer: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/3230894bdd8ab4183b83ad4c942eb6acad4acce6 Stats: 2611 lines in 22 files changed: 2030 ins; 92 del; 489 mod 8348561: Add aarch64 intrinsics for ML-DSA Reviewed-by: adinn ------------- PR: https://git.openjdk.org/jdk/pull/23300 From shade at openjdk.org Tue Mar 4 14:56:40 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 4 Mar 2025 14:56:40 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events Message-ID: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. Additional testing: - [x] Linux x86_64 server fastdebug, `jdk_jfr` ------------- Commit messages: - Separate statistics event as well - Fix Changes: https://git.openjdk.org/jdk/pull/23900/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23900&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351142 Stats: 267 lines in 8 files changed: 267 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23900/head:pull/23900 PR: https://git.openjdk.org/jdk/pull/23900 From cnorrbin at openjdk.org Tue Mar 4 15:40:46 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 4 Mar 2025 15:40:46 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v6] In-Reply-To: References: Message-ID: > Hi everyone, > > The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. > > The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: changed to can_align_up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23711/files - new: https://git.openjdk.org/jdk/pull/23711/files/dd319893..1c9c9d1a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=04-05 Stats: 79 lines in 5 files changed: 57 ins; 14 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23711.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23711/head:pull/23711 PR: https://git.openjdk.org/jdk/pull/23711 From ayang at openjdk.org Tue Mar 4 15:47:00 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 4 Mar 2025 15:47:00 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 11:56:56 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > iwalulya review > * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement > * predicate for determining whether the refinement has been disabled > * some other typos/comment improvements > * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 356: > 354: bool do_heap_region(G1HeapRegion* r) override { > 355: if (!r->is_free()) { > 356: // Need to scan all parts of non-free regions, so reset the claim. Why is the condition "is_free"? I thought we scan only old-or-humongous regions? src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 116: > 114: SwapGlobalCT, // Swap global card table. > 115: SwapJavaThreadsCT, // Swap java thread's card tables. > 116: SwapGCThreadsCT, // Swap GC thread's card tables. Do GC threads have card-table? src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 219: > 217: // The young gen revising mechanism reads the predictor and the values set > 218: // here. Avoid inconsistencies by locking. > 219: MutexLocker x(G1RareEvent_lock, Mutex::_no_safepoint_check_flag); Who else can be in this critical-section? I don't get what this lock is protecting us from. src/hotspot/share/gc/g1/g1ConcurrentRefineThread.hpp line 83: > 81: > 82: public: > 83: static G1ConcurrentRefineThread* create(G1ConcurrentRefine* cr); I wonder if the comment for this class "One or more G1 Concurrent Refinement Threads..." has become obsolete. (AFAICS, this class is a singleton.) src/hotspot/share/gc/g1/g1ConcurrentRefineWorkTask.cpp line 69: > 67: } else if (res == G1RemSet::NoInteresting) { > 68: _refine_stats.inc_cards_clean_again(); > 69: } A `switch` is probably cleaner. src/hotspot/share/gc/g1/g1ConcurrentRefineWorkTask.cpp line 78: > 76: do_dirty_card(source, dest_card); > 77: } > 78: return pointer_delta(dirty_r, dirty_l, sizeof(CardValue)); I feel the `pointer_delta` line belongs to the caller. After that, even the entire method can be inlined to the caller. YMMV. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979666477 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979678325 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979699376 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979695999 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979705019 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979709682 From cnorrbin at openjdk.org Tue Mar 4 15:48:17 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 4 Mar 2025 15:48:17 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v7] In-Reply-To: References: Message-ID: <00nHUxFecTrb5xshjIqDo40zuqdHNiANMqSNCUH2jGY=.7bd0ea79-d6a9-46f9-86d1-4e1d75a27d69@github.com> > Hi everyone, > > The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. > > The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: align comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23711/files - new: https://git.openjdk.org/jdk/pull/23711/files/1c9c9d1a..f52de010 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=05-06 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23711.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23711/head:pull/23711 PR: https://git.openjdk.org/jdk/pull/23711 From cnorrbin at openjdk.org Tue Mar 4 15:48:18 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 4 Mar 2025 15:48:18 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v2] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:34:20 GMT, Kim Barrett wrote: >> src/hotspot/share/cds/metaspaceShared.cpp line 244: >> >>> 242: >>> 243: char* aligned_base = align_up_or_null(specified_base, alignment); >>> 244: assert(is_aligned(aligned_base, alignment), "sanity"); >> >> I don't think this assert adds anything. > > Actually, it's worse than that, since is_aligned of a null pointer is problematic. Removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1979721861 From gziemski at openjdk.org Tue Mar 4 16:03:09 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 4 Mar 2025 16:03:09 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v33] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> Message-ID: <8226VeBvQn8tORIXH1zg_9fJ1A8db9zYEtP6mM8ldOU=.144cb69c-8b75-441f-9ee0-5dc9333b9943@github.com> On Tue, 4 Mar 2025 09:25:26 GMT, Afshin Zafari wrote: >> src/hotspot/share/nmt/regionsTree.cpp line 28: >> >>> 26: VMATree::SummaryDiff RegionsTree::commit_region(address addr, size_t size, const NativeCallStack& stack) { >>> 27: return commit_mapping((VMATree::position)addr, size, make_region_data(stack, mtNone), /*use tag inplace*/ true); >>> 28: } >> >> `RegionsTree::commit_region` is called by >> >> ``` >> static inline void record_virtual_memory_reserve_and_commit(void* addr, size_t size, >> const NativeCallStack& stack, MemTag mem_tag = mtNone) { >> >> >> which has mem_tag, so we could in theory use it and pass it down? Then we could avoid the complicated "use_tag_inplace" parameter handling? >> >> Not sure if this is possible in all cases. Is that why we have the need for "use_tag_inplace"? > > I have to separate the concerns as follow: > 1. In `record_..._reserve_and_commit` the `mem_tag` is available since it is needed for `reserve`. So, you are right, we can use the tag and pass it down. The change set would be: > `VMT::Instance::add_committed_region(addr, size, stack, MemTag = mtNone)` > `VMT::add_committed_region(addr, size, stack, MemTag = mtNone)` > `RegionsTree::commit_region(addr, size, stack, MemTag = mtNone)` > 3. Even if we do so, `use_tag_in_place` is needed in `commit` because the `os::commit_memmory` family do not pass mem_tag down. There is already an abandoned PR which tried to add MemTag param to this family. It was reviewed as not necessary. > 4. In addition, the `VMATree::register_mapping` has a mandatory MemTag param (in its MetaData param) which enforces the caller to pass down an specific MemTag. If we don't use `use_tag_inplace`, for every commit of region $[ base, end)$ we have to find the enclosing reserved region $[ A, B)$ where $A \le base < end \le B$ and get its mem_tag and then pass it down to `VMATree::register_mapping`. Finding the enclosing region could be too expensive and is preferred to be avoided. > > I can implement the case 1 above if it is preferred. It would help with cleaning mem_tags issue, where we are trying to clean up "mtNone" tags. >> src/hotspot/share/nmt/regionsTree.cpp line 32: >> >>> 30: VMATree::SummaryDiff RegionsTree::uncommit_region(address addr, size_t size) { >>> 31: return uncommit_mapping((VMATree::position)addr, size, make_region_data(NativeCallStack::empty_stack(), mtNone)); >>> 32: } >> >> Would it be helpful here if we were to add a new tag, that would mark this uncommitted region somehow different than mtNone (to mark it that it used to be used, but now it's not, which is different from never used region)? > > An uncommitted region has implicitly the same tag as its containing reserved region: > > <-------------Reserved Region, mtXXX----------> > <----C1----><..U1...><---C2--><..U2..><---C3--> > > C1-C3 are committed regions with tag `mtXXX` > U1 and U2 are uncommitted and enclosed by a `mtXXX` tag region. > > Do you have any specific use-case for this? > > --- > If you were thinking about *release_mapping* instead, then I can say that the released region will be sooner or later reserved by another memtag. For example, an mtStack region is released and immediately reserved by mtGC. In other words, any memtag on a released region would be overwritten by other uses of the same region. I was thinking in terms of cleaning up all the mtNone tags. Here we could set it to something more meaningful than mtNone? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1979749939 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1979752152 From tschatzl at openjdk.org Tue Mar 4 16:03:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 16:03:55 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 15:16:17 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> iwalulya review >> * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement >> * predicate for determining whether the refinement has been disabled >> * some other typos/comment improvements >> * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming > > src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 356: > >> 354: bool do_heap_region(G1HeapRegion* r) override { >> 355: if (!r->is_free()) { >> 356: // Need to scan all parts of non-free regions, so reset the claim. > > Why is the condition "is_free"? I thought we scan only old-or-humongous regions? We also need to clear young gen region marks because we want them to be all clean in the card table for the garbage collection (evacuation failure handling, use in next cycle). This is maybe a bit of a waste if there are multiple refinement rounds between two gcs, but it's less expensive than in the pause wrt to latency. It's fast anyway. > src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 116: > >> 114: SwapGlobalCT, // Swap global card table. >> 115: SwapJavaThreadsCT, // Swap java thread's card tables. >> 116: SwapGCThreadsCT, // Swap GC thread's card tables. > > Do GC threads have card-table? Hmm, I thought I changed tat already just recently with Ivan's latest requests. Will fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979742662 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979752692 From tschatzl at openjdk.org Tue Mar 4 16:07:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 16:07:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 15:33:29 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> iwalulya review >> * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement >> * predicate for determining whether the refinement has been disabled >> * some other typos/comment improvements >> * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming > > src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 219: > >> 217: // The young gen revising mechanism reads the predictor and the values set >> 218: // here. Avoid inconsistencies by locking. >> 219: MutexLocker x(G1RareEvent_lock, Mutex::_no_safepoint_check_flag); > > Who else can be in this critical-section? I don't get what this lock is protecting us from. The concurrent refine control thread in `G1ConcurrentRefineThread::do_refinement`, when calling `G1Policy::record_dirtying_stats`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979759329 From tschatzl at openjdk.org Tue Mar 4 16:07:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 16:07:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:00:46 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 116: >> >>> 114: SwapGlobalCT, // Swap global card table. >>> 115: SwapJavaThreadsCT, // Swap java thread's card tables. >>> 116: SwapGCThreadsCT, // Swap GC thread's card tables. >> >> Do GC threads have card-table? > > Hmm, I thought I changed tat already just recently with Ivan's latest requests. Will fix. Oh, I only fixed the string. Apologies. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979761737 From cnorrbin at openjdk.org Tue Mar 4 16:08:53 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 4 Mar 2025 16:08:53 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v5] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:29:48 GMT, Kim Barrett wrote: >> src/hotspot/share/utilities/align.hpp line 96: >> >>> 94: >>> 95: template >>> 96: inline T* align_up_or_null(T* ptr, A alignment) { >> >> Rather than specialized align_up variants, I think better and more generally >> usable would be a predicate for testing whether align_up is safe. Something >> like a `bool can_align_up(value, alignment)` function. >> >> I'm happy to see align_up_or_min removed from this PR for that reason, as well >> as because it has usability issues. Is a min-valued return indicating failure >> to align? Or was it just the argument is min-valued and already aligned? >> Consider passing an unsigned zero value as the value to be aligned. >> >> This seems to work: >> >> diff --git a/src/hotspot/share/utilities/align.hpp b/src/hotspot/share/utilities/align.hpp >> index b67e61036a0..d73e8e086ca 100644 >> --- a/src/hotspot/share/utilities/align.hpp >> +++ b/src/hotspot/share/utilities/align.hpp >> @@ -30,6 +30,8 @@ >> #include "utilities/debug.hpp" >> #include "utilities/globalDefinitions.hpp" >> #include "utilities/powerOfTwo.hpp" >> + >> +#include >> #include >> >> // Compute mask to use for aligning to or testing alignment. >> @@ -70,6 +72,17 @@ constexpr T align_down(T size, A alignment) { >> return result; >> } >> >> +template::value)> >> +constexpr bool can_align_up(T size, A alignment) { >> + return align_down(std::numeric_limits::max(), alignment) >= size; >> +} >> + >> +template >> +inline bool can_align_up(const void* p, A alignment) { >> + static_assert(sizeof(p) == sizeof(uintptr_t), "assumption"); >> + return can_align_up(reinterpret_cast(p), alignment); >> +} >> + >> template::value)> >> constexpr T align_up(T size, A alignment) { >> T adjusted = checked_cast(size + alignment_mask(alignment)); >> diff --git a/test/hotspot/gtest/utilities/test_align.cpp b/test/hotspot/gtest/utilities/test_align.cpp >> index 3c03fd5f24d..2112d43feab 100644 >> --- a/test/hotspot/gtest/utilities/test_align.cpp >> +++ b/test/hotspot/gtest/utilities/test_align.cpp >> @@ -27,6 +27,51 @@ >> #include "unittest.hpp" >> >> #include >> +#include >> + >> +template >> +static constexpr void test_can_align_up() { >> + int alignment_value = 4; >> + int small_value = 63; >> + A alignment = static_cast(alignment_value); >> + >> + EXPECT_TRUE(can_align_up... > > I forgot to add any description comments for can_align_up. And align_up description should say can_align_up > is a precondition. I removed `align_up_or_null` in favour of `can_align_up`, and I agree that it feels more usable. Instead of `align_up_or_null`, we can now either return early or set a different value if `can_align_up` fails. I updated the files here and to me it feels like an improvement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1979762563 From shade at openjdk.org Tue Mar 4 16:16:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 4 Mar 2025 16:16:33 GMT Subject: RFR: 8351187: Add JFR monitor notification event Message-ID: We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. Additional testing: - [x] Linux x86_64 server fastdebug, `jdk_jfr` ------------- Commit messages: - Disable by default - Fix Changes: https://git.openjdk.org/jdk/pull/23901/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23901&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351187 Stats: 161 lines in 7 files changed: 155 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23901/head:pull/23901 PR: https://git.openjdk.org/jdk/pull/23901 From mli at openjdk.org Tue Mar 4 16:18:27 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 4 Mar 2025 16:18:27 GMT Subject: RFR: 8351145: RISC-V: only enable some crypto intrinsic when AvoidUnalignedAccess == false Message-ID: <8-3lLYr9jtNOQJhRLRyAA2xxfxG2aVm27HIGcbNsCfY=.8e43a66a-5602-4345-b5a0-cfdaab7e0d8f@github.com> Hi, Can you help to review the patch? Depending whether a cpu supports fast misaligned access or not, the misaligned access can impact the performance a lot. Some crypto intrinsic implementation on riscv do not consider data alignment and just use `ld` to load input byte array, and seems there is no way to do it, the main reason is that at java API level, the input byte array to these JVM intrinsic could be part of a real java array, so the input byte array could be 1/2...7 byte aligned. And with the introduction of COH, it would be even complicated to do the input data alignment. So, for the consistency of performance, seems it's better to disable these intrinsics when AvoidUnalignedAccess == true. And the user can still enable the intrinsics explicitly on a CPU with AvoidUnalignedAccess == true if they want so. Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/23903/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23903&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351145 Stats: 10 lines in 1 file changed: 4 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23903/head:pull/23903 PR: https://git.openjdk.org/jdk/pull/23903 From tschatzl at openjdk.org Tue Mar 4 16:20:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 16:20:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 15:56:05 GMT, Thomas Schatzl wrote: > It's fast anyway. To clarify: If you have multiple refinement rounds between two garbage collections, the time to clear the young gen cards is almost noise compared to the actual refinement effort. Like two magnitudes faster. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979785011 From tschatzl at openjdk.org Tue Mar 4 16:34:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 16:34:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: <3LR5VKMhSuXWmMlphpe8SLHm8vQQt6j343qaO61S_mQ=.dc1d2e4a-c858-44bd-9da0-f3f98340d939@github.com> On Tue, 4 Mar 2025 16:04:00 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 219: >> >>> 217: // The young gen revising mechanism reads the predictor and the values set >>> 218: // here. Avoid inconsistencies by locking. >>> 219: MutexLocker x(G1RareEvent_lock, Mutex::_no_safepoint_check_flag); >> >> Who else can be in this critical-section? I don't get what this lock is protecting us from. > > The concurrent refine control thread in `G1ConcurrentRefineThread::do_refinement`, when calling `G1Policy::record_dirtying_stats`. I could create an extra mutex for that if you want to make it clear which two parties access the same data. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979810144 From tschatzl at openjdk.org Tue Mar 4 17:20:28 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 17:20:28 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v12] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review * renamings * refactorings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/b4d19d9b..4a978118 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=10-11 Stats: 34 lines in 4 files changed: 13 ins; 1 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From pchilanomate at openjdk.org Tue Mar 4 17:38:58 2025 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 4 Mar 2025 17:38:58 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> Message-ID: <09Lu69Do9amzXyGok3KDuP2whACShrPwRM7BOel5wgg=.ceed3ba0-9f91-4e95-9cf5-0e85362e29df@github.com> On Tue, 4 Mar 2025 04:50:34 GMT, David Holmes wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 204: >> >>> 202: // If the thread (F) that removes itself from the end of the list >>> 203: // hasn't got any prev pointer, we just set the tail pointer to >>> 204: // null, see 5) and 6) below. >> >> Setting the tail pointer to null would be for the case when this node is also the head, i.e single element. Otherwise we just rebuild the doubly link list, unlink F, and set entry_list_tail to G. In other words, the comment here and below seems to be missing that we have to build the doubly link list when F acquires the monitor, not when F needs to find a successor. > > We don't rebuild at this point. The thread that is removing itself just sets tail to null if there is no prev. Later when F exits the monitor it will construct the DLL to find the next successor. But if there is a previous node (just no previous pointer set) we have to rebuild the list, otherwise G would still be pointing to F. It would be this case: https://github.com/fbredber/jdk/blob/283c2431ec64b0865d4e678913c636732d01658f/src/hotspot/share/runtime/objectMonitor.cpp#L1313 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1979921706 From egahlin at openjdk.org Tue Mar 4 17:40:07 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 4 Mar 2025 17:40:07 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events In-Reply-To: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: On Tue, 4 Mar 2025 14:47:09 GMT, Aleksey Shipilev wrote: > We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. > > This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` src/hotspot/share/jfr/metadata/metadata.xml line 124: > 122: > 123: > 124: For consistency with other statistical events, the category for JavaMonitorStatistics should be "Java Application, Statistics" The event should probably be periodic, so users can set an interval to reduce the number of events, with a default period of "everyChunk", so it is emitted at least at the beginning and end of a recording. src/hotspot/share/jfr/metadata/metadata.xml line 125: > 123: > 124: > 125: The label should be 'Monitor in Use' (lowercase 'i'). Here is the style guideline if you're wondering. https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Label.html test/jdk/jdk/jfr/event/runtime/TestJavaMonitorDeflateEvent.java line 82: > 80: waitThread.join(); > 81: // Let deflater thread run. > 82: Thread.sleep(3000); I see that you took code from the MonitorInflate test. It's a really old test. A RecordingStream would be a more suitable as you can avoid using Thread.sleep() and the TestThread. I don't think a file needs to be dumped if the events are printed to standard out. Something like this: String lockClassName = lock.getClass().getName(); List events = new CopyOnWriteArrayList<>(); try (RecordingStream rs = new RecordingStream()) { rs.enable(EVENT_NAME).withoutThreshold(); rs.onEvent(EVENT_NAME, e -> { RecordedClass clazz = e.getType(FIELD_KLASS_NAME); if (clazz.getName().equals(lockClassName)) { rs.close(); } }); rs.startAsync(); ... synchronized (lock) { ... } ... rs.awaitTermination(); System.out.println(events); RecordedEvent event = events.get(0); Events.assertField(event, FIELD_ADDRESS).notEqual(0L); } test/jdk/jdk/jfr/event/runtime/TestJavaMonitorStatisticsEvent.java line 60: > 58: Recording recording = new Recording(); > 59: recording.enable(EVENT_NAME).withThreshold(Duration.ofMillis(0)); > 60: final Lock lock = new Lock(); If the event is periodic, you can set: `recording.enable(EVENT_NAME).with("period", "everyChunk");` and use the following instead of isAnyFound: List events = Events.fromRecording(recording); Events.hasEvents(events); There's no need to dump to failed.jfr. Events.fromRecording will create a file that can be inspected in case the test fails. try-with-resources would be nice to have. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23900#discussion_r1979685743 PR Review Comment: https://git.openjdk.org/jdk/pull/23900#discussion_r1979647552 PR Review Comment: https://git.openjdk.org/jdk/pull/23900#discussion_r1979758837 PR Review Comment: https://git.openjdk.org/jdk/pull/23900#discussion_r1979896549 From fbredberg at openjdk.org Tue Mar 4 18:12:59 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 4 Mar 2025 18:12:59 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: <09Lu69Do9amzXyGok3KDuP2whACShrPwRM7BOel5wgg=.ceed3ba0-9f91-4e95-9cf5-0e85362e29df@github.com> References: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> <09Lu69Do9amzXyGok3KDuP2whACShrPwRM7BOel5wgg=.ceed3ba0-9f91-4e95-9cf5-0e85362e29df@github.com> Message-ID: On Tue, 4 Mar 2025 17:36:43 GMT, Patricio Chilano Mateo wrote: >> We don't rebuild at this point. The thread that is removing itself just sets tail to null if there is no prev. Later when F exits the monitor it will construct the DLL to find the next successor. > > But if there is a previous node (just no previous pointer set) we have to rebuild the list, otherwise G would still be pointing to F. It would be this case: https://github.com/fbredber/jdk/blob/283c2431ec64b0865d4e678913c636732d01658f/src/hotspot/share/runtime/objectMonitor.cpp#L1313 You're quite right. I'll rewrite that section of the comment. Thank you for spotting this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1979966484 From pchilanomate at openjdk.org Tue Mar 4 18:13:00 2025 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 4 Mar 2025 18:13:00 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> Message-ID: On Tue, 4 Mar 2025 13:27:17 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 1359: >> >>> 1357: // Build the doubly linked list to get hold of currentNode->prev(). >>> 1358: _entry_list_tail = nullptr; >>> 1359: entry_list_tail(current); >> >> I think we should try to avoid having to rebuild the doubly link list from scratch, since only a few nodes in the front might be missing the previous links. For platform threads it might not matter that much, but for virtual threads this list could be much larger. Maybe we can leave it as a future enhancement. > > We don't have a prev node, we don't know which node to set next to our next node to. The list will be broken. Right, we still have to set the previous links for those nodes. I'm just suggesting we don't have to walk the whole list, just until the last node we set the previous pointer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1979963352 From pchilanomate at openjdk.org Tue Mar 4 18:26:10 2025 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 4 Mar 2025 18:26:10 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v5] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Tue, 4 Mar 2025 04:56:24 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with two additional commits since the last revision: > > - fix typo > - moved and hopefully improved invokedynamic comment Marked as reviewed by pchilanomate (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23557#pullrequestreview-2658565622 From mpowers at openjdk.org Tue Mar 4 19:28:02 2025 From: mpowers at openjdk.org (Mark Powers) Date: Tue, 4 Mar 2025 19:28:02 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v2] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 19:00:59 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Added comments, removed debugging printfs ML-DSA benchmark results for this PR keygen ML-DSA-44 96 us/op keygen ML-DSA-65 200 us/op keygen ML-DSA-87 272 us/op siggen ML-DSA-44 297 us/op siggen ML-DSA-65 452 us/op siggen ML-DSA-87 728 us/op sigver ML-DSA-44 115 us/op sigver ML-DSA-65 176 us/op sigver ML-DSA-87 290 us/op ML-DSA no intrinsics keygen ML-DSA-44 169 us/op keygen ML-DSA-65 302 us/op keygen ML-DSA-87 444 us/op siggen ML-DSA-44 696 us/op siggen ML-DSA-65 1114 us/op siggen ML-DSA-87 1828 us/op sigver ML-DSA-44 187 us/op sigver ML-DSA-65 295 us/op sigver ML-DSA-87 473 us/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/23860#issuecomment-2698691038 From gziemski at openjdk.org Tue Mar 4 19:48:16 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 4 Mar 2025 19:48:16 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v34] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Tue, 4 Mar 2025 11:20:05 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > test cases for doing reserve or commit the same region twice. LGTM, thanks! ------------- Marked as reviewed by gziemski (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2658817432 From gziemski at openjdk.org Tue Mar 4 19:48:17 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 4 Mar 2025 19:48:17 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v33] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <6HZ_GjpHmTXV-HiRJQ1GucpUNC9YifDkXIpUnAupjJ4=.a94bb403-b0d6-4d65-97c7-4644245aae55@github.com> Message-ID: On Tue, 4 Mar 2025 10:08:18 GMT, Afshin Zafari wrote: >> test/hotspot/gtest/nmt/test_regions_tree.cpp line 102: >> >>> 100: rt.commit_region((address)1030, 5UL, ncs); >>> 101: rt.commit_region((address)1040, 5UL, ncs); >>> 102: ReservedMemoryRegion rmr((address)1000, 50); >> >> I would add something like: >> >> rt.commit_region((address)1500, 5UL, ncs); // should not be counted >> >> that should not be counted. > > adding that line crashes for me as follows: > > Internal Error (/src/hotspot/share/nmt/vmatree.cpp:77), pid=87926, tid=87926 > # assert(leqA_n->val().out.type() != StateType::Released) failed: Should not use inplace the tag of a released region > # > > It means that we cannot commit without reserving first. The region being committed is released but should be reserved. I was running it without asserts and didn't see it, but it's good that assert catches it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1980126456 From shade at openjdk.org Tue Mar 4 20:09:58 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 4 Mar 2025 20:09:58 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v5] In-Reply-To: References: Message-ID: <2ZFtKLn2EcbzjKQ_USb3yiOWEWQJYocFwj_rk-5h0Jg=.f4eec566-3e0c-4a75-8c27-2cb785b0081a@github.com> On Tue, 4 Mar 2025 04:13:33 GMT, Cesar Soares Lucas wrote: >> In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. >> >> The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. >> >> The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. >> >> Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. >> >> The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. >> >> Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. >> >> Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. > > Cesar Soares Lucas has updated the pull request incrementally with two additional commits since the last revision: > > - Revert changes to shared cardTable.hpp > - Revert changes to shared cardTable.hpp Much cleaner, thanks! I'll take another look later, but meanwhile, some comments: src/hotspot/cpu/arm/gc/shared/cardTableBarrierSetAssembler_arm.cpp line 100: > 98: assert(bs->kind() == BarrierSet::CardTableBarrierSet, > 99: "Wrong barrier set kind"); > 100: Unnecessary deletion of blank line? src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp line 655: > 653: > 654: #ifndef _LP64 > 655: __ pop(tmp1); Sounds like `tmp1` is undefined here. Should be `tmp`? src/hotspot/os_cpu/linux_arm/javaThread_linux_arm.cpp line 46: > 44: if (UseShenandoahGC) { > 45: _card_table_base = nullptr; > 46: return ; Suggestion: return; src/hotspot/os_cpu/linux_arm/javaThread_linux_arm.cpp line 50: > 48: _card_table_base = nullptr; > 49: } > 50: Unnecessary removals of blank lines? src/hotspot/share/ci/ciUtilities.cpp line 49: > 47: CardTableBarrierSet* ctbs = barrier_set_cast(bs); > 48: CardTable* ct = ctbs->card_table(); > 49: SHENANDOAHGC_ONLY(assert(!UseShenandoahGC, "Shenandoah byte_map_base is not constant.");) Here is a bit of a trick about the `Use${X}GC` flags: you don't need to guard them with `${X}GC_ONLY` macros. They are specifically designed that way: they reside in `gc_globals.hpp` without any feature flags. src/hotspot/share/gc/shenandoah/shenandoahCardTable.cpp line 25: > 23: */ > 24: > 25: #include "gc/shenandoah/shenandoahThreadLocalData.hpp" Includes should be sorted alphabetically. src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 268: > 266: > 267: void ShenandoahGeneration::prepare_gc() { > 268: Unnecessary removal. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 258: > 256: if (ShenandoahCardBarrier) { > 257: ShenandoahThreadLocalData::set_card_table(Thread::current(), bs->card_table()->write_byte_map_base()); > 258: } Er. This sets up card table for VMThread, right? I am surprised we do not need this for other fields in `ShenandoahThreadLocalData`. src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 407: > 405: ShenandoahCardCluster(ShenandoahDirectCardMarkRememberedSet* rs) { > 406: _rs = rs; > 407: _object_starts = NEW_C_HEAP_ARRAY(crossing_info, rs->total_cards()+1, mtGC); What is this `+1`? This is #23882, right? ------------- PR Review: https://git.openjdk.org/jdk/pull/23170#pullrequestreview-2656931853 PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1980148491 PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1979192037 PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1980147454 PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1980121669 PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1980118417 PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1980116049 PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1979940218 PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1979944657 PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1979158102 From cslucas at openjdk.org Tue Mar 4 21:08:08 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 4 Mar 2025 21:08:08 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v5] In-Reply-To: <2ZFtKLn2EcbzjKQ_USb3yiOWEWQJYocFwj_rk-5h0Jg=.f4eec566-3e0c-4a75-8c27-2cb785b0081a@github.com> References: <2ZFtKLn2EcbzjKQ_USb3yiOWEWQJYocFwj_rk-5h0Jg=.f4eec566-3e0c-4a75-8c27-2cb785b0081a@github.com> Message-ID: On Tue, 4 Mar 2025 10:50:30 GMT, Aleksey Shipilev wrote: >> Cesar Soares Lucas has updated the pull request incrementally with two additional commits since the last revision: >> >> - Revert changes to shared cardTable.hpp >> - Revert changes to shared cardTable.hpp > > src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 407: > >> 405: ShenandoahCardCluster(ShenandoahDirectCardMarkRememberedSet* rs) { >> 406: _rs = rs; >> 407: _object_starts = NEW_C_HEAP_ARRAY(crossing_info, rs->total_cards()+1, mtGC); > > What is this `+1`? This is #23882, right? Yes, correct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1980229122 From vpaprotski at openjdk.org Tue Mar 4 21:16:14 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 4 Mar 2025 21:16:14 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v2] In-Reply-To: References: Message-ID: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. > > Before (no AVX512) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s > > After (with AVX2) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s > > > Before (with AVX512): > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 102... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: comments from Sandhya ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23719/files - new: https://git.openjdk.org/jdk/pull/23719/files/e0803952..1bc0b8c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=00-01 Stats: 100 lines in 5 files changed: 61 ins; 7 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/23719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23719/head:pull/23719 PR: https://git.openjdk.org/jdk/pull/23719 From vpaprotski at openjdk.org Tue Mar 4 21:16:15 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 4 Mar 2025 21:16:15 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v2] In-Reply-To: <7wEGLF0MOmtHAl_cwEOOXNPy_Ckz8j0WmabDR_asitM=.7e772dad-8e67-402f-bdc4-9dad0925f20c@github.com> References: <7wEGLF0MOmtHAl_cwEOOXNPy_Ckz8j0WmabDR_asitM=.7e772dad-8e67-402f-bdc4-9dad0925f20c@github.com> Message-ID: On Thu, 27 Feb 2025 19:05:50 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> comments from Sandhya > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 397: > >> 395: __ xorq(acc2, acc2); >> 396: __ addq(acc1, tmp_rax); >> 397: __ adcq(acc2, tmp_rdx); > > Why adcq here instead of addq? The vector code doesn't do that. Its a difference in multiply instruction, how the 'high' and 'low' parts are handled. i.e. Given that inputs a and b are 52 bits: - mulq a * b = 40:64 bits in high/low - vpmadd52{l,h}uq = 52:52 vpmadd52 (by design) leaves upper 12 bits for carry propagation, whereas with mulq, we have to do the propagation 'immediately;. > src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 424: > >> 422: __ shrq(acc1, 52); // low 52 of acc1 ignored, is zero, because Montgomery >> 423: >> 424: // Acc2[0] += carry > > This is more like shift in carry into lower bits of Acc2[0] so comment could be updated. Hmm. Shift implies (to me?) overriding Acc2, which isn't the case entirely (there is a slight overlap). But.. what is happening here is also not 100% obvious. acc2 (i.e. Acc2[0]) is the upper 40+ bits of the multiply result and acc1 is here the 12+ bit carry (leftover); It is not correctly lined up (yet) because how vpmaddq vs mulq produce high/low parts. (its actually overlapping 13 and 41 bits, to get 54 bits, but we have 12 bits to spare in 64bit reg so no need to be exact) I tried adjusting the 'heading' comment to the function to explain this flow better; hope its better? > src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 441: > >> 439: __ subq(acc2, modulus); >> 440: __ vpsubq(Acc2, Acc1, Modulus, Assembler::AVX_256bit); >> 441: __ vmovdqu(Address(rsp, -32), Acc2); //Assembler::AVX_256bit > > Need to first create space on stack and then store temp. Done. Also added stack alignment so I can use aligned spill. Also using just one spill slot instead of two. Re-ran fuzzing tests > src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 465: > >> 463: >> 464: // Now carry propagate the multiply result and (constant-time) select correct >> 465: // output digit > > Carry propagate multiply result is done before subtracting modulus in the Java code. That was intentional.. I believe this is faster.. While in vector 'domain', subtract is essentially one vector operation and one scalar operation.. first carry propagation is expensive chain, but second 'hides' within the first propagation. Conversely, with Java ordering: (1) carry propagate Acc1, (2) scalar subtract Acc2, (3) carry propagate Acc2; the critical path is longer (and subtract isnt vectorized) PS: "why didn't you do java the same way" :) I chose clarity for the java code; also java jitted code is in 'scalar domain', so there are plenty of partial sums for out-of-order execution to keep the pipeline fed. Its the vector-to-scalar crossover here that requires the special handling > src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 467: > >> 465: // output digit >> 466: Register digit = acc1; >> 467: __ vmovdqu(Address(rsp, -64), Acc1); //Assembler::AVX_256bit > > Need to first create space on stack and then store. done > src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 475: > >> 473: } >> 474: __ movq(carry, digit); >> 475: __ sarq(carry, 52); > > This was unsigned or logical shift in Java code. For Acc1, the limbs are all positive, around 54-bits. sarq is important for the Acc2 propagation (I added a comment there). I used sarq here mostly for symmetry since it really doesnt matter mathematically > src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 556: > >> 554: // - constant time (i.e. no branches) >> 555: // - no-side channel (i.e. all memory must always be accessed, and in same order) >> 556: void assign_avx(Register aBase, Register bBase, int offset, XMMRegister select, XMMRegister tmp, XMMRegister aTmp, int vector_len, MacroAssembler* _masm) { > > Good to add the comment from assign_scalar here as well: > // Original java: > // long dummyLimbs = maskValue & (a[i] ^ b[i]); > // a[i] = dummyLimbs ^ a[i]; done > src/java.base/share/classes/sun/security/util/math/intpoly/MontgomeryIntegerPolynomialP256.java line 423: > >> 421: r[2] = ((c7 & mask) | (c2 & ~mask)); >> 422: r[3] = ((c8 & mask) | (c3 & ~mask)); >> 423: r[4] = ((c9 & mask) | (c4 & ~mask)); > > It would be good to add a comment here indicating that if the result (c9 - c5) had overflown by one modulus, result - modulus (c4-c0) would be positive else it would be negative. i.e. Upper bits of c4 would be all zeroes on overflow otherwise upper bits of c4 would be all ones. Thus on overflow, return value "r" should be set to result - modulus (c4 - c0) else it should be set to result (c9-c5). done (I think.. please double-check if my version of the comment is helpful) > test/jdk/com/sun/security/util/math/intpoly/MontgomeryPolynomialFuzzTest.java line 2: > >> 1: /* >> 2: * Copyright (c) 2025, Intel Corporation. All rights reserved. > > This should be Copyright (c) 2024, 2025, Intel Corporation. All rights reserved. done, thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1978375998 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1980186231 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1980187534 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1980204172 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1980194853 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1980209434 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1980226208 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1978307783 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1978307045 From vpaprotski at openjdk.org Tue Mar 4 22:03:14 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 4 Mar 2025 22:03:14 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v3] In-Reply-To: References: Message-ID: <51kTbZfxDMQ9XAHRt23O8mYkUjCeg3Wbrp9WtYkiYYU=.6eb73f3c-c78d-43e0-a51b-17f64c8ad669@github.com> > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. > > Before (no AVX512) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s > > After (with AVX2) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s > > > Before (with AVX512): > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 102... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: whitespace, correct vpxor, missed comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23719/files - new: https://git.openjdk.org/jdk/pull/23719/files/1bc0b8c8..bb450137 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=01-02 Stats: 12 lines in 1 file changed: 3 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23719/head:pull/23719 PR: https://git.openjdk.org/jdk/pull/23719 From duke at openjdk.org Tue Mar 4 22:04:26 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 4 Mar 2025 22:04:26 GMT Subject: RFR: 8349721: Add aarch64 intrinsics for ML-KEM [v4] In-Reply-To: References: Message-ID: > By using the aarch64 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Fixed mismerge. - Merged master. - A little cleanup - Merged master - removing trailing spaces - kyber aarch64 intrinsics ------------- Changes: https://git.openjdk.org/jdk/pull/23663/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23663&range=03 Stats: 2508 lines in 18 files changed: 2464 ins; 16 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/23663.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23663/head:pull/23663 PR: https://git.openjdk.org/jdk/pull/23663 From dlong at openjdk.org Tue Mar 4 23:14:20 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 4 Mar 2025 23:14:20 GMT Subject: Integrated: 8336042: Caller/callee param size mismatch in deoptimization causes crash In-Reply-To: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: <-DOch4HtWdaPuapC0aBkemPls96miNahQpfaqEwSyog=.4e7dddaf-6517-4b84-88f2-5833fb19054c@github.com> On Tue, 11 Feb 2025 07:59:01 GMT, Dean Long wrote: > When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. > > In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. > > Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. This pull request has now been integrated. Changeset: 20ea218c Author: Dean Long URL: https://git.openjdk.org/jdk/commit/20ea218ce52f79704445acfe2d4a3dc9d04e86d2 Stats: 161 lines in 11 files changed: 147 ins; 3 del; 11 mod 8336042: Caller/callee param size mismatch in deoptimization causes crash Co-authored-by: Richard Reingruber Reviewed-by: pchilanomate, rrich, vlivanov, never ------------- PR: https://git.openjdk.org/jdk/pull/23557 From fyang at openjdk.org Wed Mar 5 00:13:14 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 5 Mar 2025 00:13:14 GMT Subject: RFR: 8351101: RISC-V: C2: Small improvement to MacroAssembler::revb [v2] In-Reply-To: References: Message-ID: > Hi, please review this small improvement. > After logic shift right 56 bits, there is no need to zero extend the remaining 8-bit value. > The reason is that the upper bits will be all zeros as this is a logic shift right. > Testing: `hotspot:tier1` is clean on linux-riscv64 platform with this change. Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8351101 - 8351101: RISC-V: C2: Small improvement to MacroAssembler::revb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23879/files - new: https://git.openjdk.org/jdk/pull/23879/files/eb7a9402..8f211b9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23879&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23879&range=00-01 Stats: 5997 lines in 110 files changed: 4079 ins; 983 del; 935 mod Patch: https://git.openjdk.org/jdk/pull/23879.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23879/head:pull/23879 PR: https://git.openjdk.org/jdk/pull/23879 From cslucas at openjdk.org Wed Mar 5 00:57:54 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 5 Mar 2025 00:57:54 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v5] In-Reply-To: <2ZFtKLn2EcbzjKQ_USb3yiOWEWQJYocFwj_rk-5h0Jg=.f4eec566-3e0c-4a75-8c27-2cb785b0081a@github.com> References: <2ZFtKLn2EcbzjKQ_USb3yiOWEWQJYocFwj_rk-5h0Jg=.f4eec566-3e0c-4a75-8c27-2cb785b0081a@github.com> Message-ID: On Tue, 4 Mar 2025 17:53:57 GMT, Aleksey Shipilev wrote: >> Cesar Soares Lucas has updated the pull request incrementally with two additional commits since the last revision: >> >> - Revert changes to shared cardTable.hpp >> - Revert changes to shared cardTable.hpp > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 258: > >> 256: if (ShenandoahCardBarrier) { >> 257: ShenandoahThreadLocalData::set_card_table(Thread::current(), bs->card_table()->write_byte_map_base()); >> 258: } > > Er. This sets up card table for VMThread, right? I am surprised we do not need this for other fields in `ShenandoahThreadLocalData`. Yes, that's for the VMThread. That seems like a good question. I ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1980492593 From cslucas at openjdk.org Wed Mar 5 01:10:50 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 5 Mar 2025 01:10:50 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v6] In-Reply-To: References: Message-ID: > In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. > > The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. > > The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. > > Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. > > The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. > > Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. > > Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Address PR feedback: formatting. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23170/files - new: https://git.openjdk.org/jdk/pull/23170/files/717b8b44..cbf5aab0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23170&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23170&range=04-05 Stats: 8 lines in 6 files changed: 4 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23170.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23170/head:pull/23170 PR: https://git.openjdk.org/jdk/pull/23170 From cslucas at openjdk.org Wed Mar 5 01:14:44 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 5 Mar 2025 01:14:44 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v7] In-Reply-To: References: Message-ID: > In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. > > The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. > > The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. > > Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. > > The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. > > Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. > > Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Fix merge conflict - Address PR feedback: formatting. - Revert changes to shared cardTable.hpp - Revert changes to shared cardTable.hpp - Fix merge conflict - Address PR feedback: no changes to shared files. - Merge master - Addressing PR comments: some refactorings, ppc fix, off-by-one fix. - Relocation of Card Tables ------------- Changes: https://git.openjdk.org/jdk/pull/23170/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23170&range=06 Stats: 295 lines in 28 files changed: 150 ins; 92 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/23170.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23170/head:pull/23170 PR: https://git.openjdk.org/jdk/pull/23170 From fyang at openjdk.org Wed Mar 5 02:07:58 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 5 Mar 2025 02:07:58 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar In-Reply-To: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Fri, 28 Feb 2025 14:34:47 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. > > ## Performance > > data > > Benchmark | (vectorDim) | Mode | Cnt | Score - master | Error - master | Score - patch | Error - patch | Units | Improvement (master/patch) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.613 | 0.039 | 219.549 | 0.135 | ns/op | 1 > Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 354.956 | 0.109 | 354.987 | 0.178 | ns/op | 1 > Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.314 | 1.596 | 581.873 | 0.084 | ns/op | 1.001 > Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1034.657 | 0.259 | 1035.217 | 0.184 | ns/op | 0.999 > Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 11068.314 | 1.819 | 2594 | 0.207 | ns/op | 4.267 > Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 23197.406 | 9.984 | 5167.862 | 0.111 | ns/op | 4.489 > Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 46773.589 | 7.181 | 10015.53 | 0.412 | ns/op | 4.67 > Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 93579.771 | 17.124 | 19986.546 | 0.331 | ns/op | 4.682 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 5853.504 | 2.676 | 2628.591 | 0.793 | ns/op | 2.227 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 11657.465 | 0.735 | 5201.419 | 1.149 | ns/op | 2.241 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 18156.051 | 6.518 | 10377.444 | 0.206 | ns/op | 1.75 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 36463.97 | 31.391 | 20008.274 | 3.274 | ns/op | 1.822 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 52423.99 | 153.302 | 1345.107 | 0.055 | ns/op | 38.974 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 95697.45 | 423.436 | 2671.999 | 0.456 | ns/op | 35.815 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 188838.13 | 1248.92 | 5044.997 | 1.468 | ns... Hi, Thanks for working on this part. Some initial comments after a cursory look. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2141: > 2139: if (ft == FLOAT_TYPE::half_precision) { > 2140: assert_cond(UseZfh); > 2141: } Suggestion: `assert_cond((ft != FLOAT_TYPE::half_precision) || UseZfh);` src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 6392: > 6390: fmv_h_x(dst, src); > 6391: fcvt_s_h(dst, dst); > 6392: j(DONE); It looks to me confusing to have pairs like `float16_to_float` and `float16_to_float_c2`. As there is only one use for `float16_to_float` in file `src/hotspot/cpu/riscv/stubGenerator_riscv.cpp`, I would suggest we inline the code in the callsite. Then we could remove this assembler routine and rename `float16_to_float_c2` to `float16_to_float`. Also when inlining the code in the callsite, we could replace this `j(DONE)` with a direct return, thus saving one jump instruction. ------------- PR Review: https://git.openjdk.org/jdk/pull/23844#pullrequestreview-2659617194 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1980550524 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1980537352 From fyang at openjdk.org Wed Mar 5 02:19:57 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 5 Mar 2025 02:19:57 GMT Subject: RFR: 8351101: RISC-V: C2: Small improvement to MacroAssembler::revb [v2] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 06:05:27 GMT, Feilong Jiang wrote: >> Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8351101 >> - 8351101: RISC-V: C2: Small improvement to MacroAssembler::revb > > Looks fine. @feilongjiang @Hamlin-Li : Thanks! GHA is clean now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23879#issuecomment-2699574791 From fyang at openjdk.org Wed Mar 5 02:19:58 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 5 Mar 2025 02:19:58 GMT Subject: Integrated: 8351101: RISC-V: C2: Small improvement to MacroAssembler::revb In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 01:28:32 GMT, Fei Yang wrote: > Hi, please review this small improvement. > After logic shift right 56 bits, there is no need to zero extend the remaining 8-bit value. > The reason is that the upper bits will be all zeros as this is a logic shift right. > Testing: `hotspot:tier1` is clean on linux-riscv64 platform with this change. This pull request has now been integrated. Changeset: b1a21b56 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/b1a21b563e3ae13fa5c409a4f0c04686c3f5b34a Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8351101: RISC-V: C2: Small improvement to MacroAssembler::revb Reviewed-by: fjiang, mli ------------- PR: https://git.openjdk.org/jdk/pull/23879 From dholmes at openjdk.org Wed Mar 5 05:17:55 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Mar 2025 05:17:55 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> <09Lu69Do9amzXyGok3KDuP2whACShrPwRM7BOel5wgg=.ceed3ba0-9f91-4e95-9cf5-0e85362e29df@github.com> Message-ID: On Tue, 4 Mar 2025 18:09:56 GMT, Fredrik Bredberg wrote: >> But if there is a previous node (just no previous pointer set) we have to rebuild the list, otherwise G would still be pointing to F. It would be this case: https://github.com/fbredber/jdk/blob/283c2431ec64b0865d4e678913c636732d01658f/src/hotspot/share/runtime/objectMonitor.cpp#L1313 > > You're quite right. I'll rewrite that section of the comment. Thank you for spotting this. Yep my bad - you can't delete yourself without a prev node pointer when you are being pointed to. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1980709561 From dholmes at openjdk.org Wed Mar 5 06:22:53 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Mar 2025 06:22:53 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events In-Reply-To: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: On Tue, 4 Mar 2025 14:47:09 GMT, Aleksey Shipilev wrote: > We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. > > This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` Seems like a good idea. I wonder why the deflation event was not added earlier? ------------- PR Review: https://git.openjdk.org/jdk/pull/23900#pullrequestreview-2660079423 From dholmes at openjdk.org Wed Mar 5 06:53:55 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Mar 2025 06:53:55 GMT Subject: RFR: 8351187: Add JFR monitor notification event In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:05:36 GMT, Aleksey Shipilev wrote: > We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. > > Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` Again a reasonable idea but note that notifications can occur much more frequently than waits. A wait only happens when a thread actually has to wait, whereas notifications tend to happen whenever a data structure is updated in a key way ------------- PR Review: https://git.openjdk.org/jdk/pull/23901#pullrequestreview-2660126462 From dholmes at openjdk.org Wed Mar 5 06:56:53 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Mar 2025 06:56:53 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events In-Reply-To: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: On Tue, 4 Mar 2025 14:47:09 GMT, Aleksey Shipilev wrote: > We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. > > This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` src/hotspot/share/runtime/objectMonitor.cpp line 663: > 661: const oop obj) { > 662: assert(event != nullptr, "invariant"); > 663: event->set_monitorClass(obj->klass()); Now that I have seen the "hidden wait" logic in the other PR, should inflation/deflation events not also check `is_excluded`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23900#discussion_r1980805176 From alanb at openjdk.org Wed Mar 5 07:29:06 2025 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 5 Mar 2025 07:29:06 GMT Subject: RFR: 8351187: Add JFR monitor notification event In-Reply-To: References: Message-ID: <2b6qM-t3jmRAx9XW1SdpKVeJBuKRqzFD_FFXQ4Hka5A=.af985e1e-ad56-44cc-b077-05a45ed2bd75@github.com> On Tue, 4 Mar 2025 16:05:36 GMT, Aleksey Shipilev wrote: > We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. > > Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` Do I read this correctly that this is a duration event and will only be recorded if the notify takes >= 10ms? When I read David's comment then I assumed it was an instant event but it seems not. Given the "address" field then I assume this is intended to be part of a troubleshooting recipe, maybe with threshold set to 0ms, is that right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23901#issuecomment-2700083342 From mli at openjdk.org Wed Mar 5 09:18:54 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 5 Mar 2025 09:18:54 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: <3KllzIfPgFYyz5GeIPmN7uKNyd7RdwlP0LyOe84b5Co=.4e58f5f5-b651-44c2-8d4c-981730a760ff@github.com> On Wed, 5 Mar 2025 02:03:18 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this patch? >> It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. >> >> ## Performance >> >> data >> >> Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 >> Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 >> Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 >> Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 >> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 >> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 >> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 >> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.81... > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2141: > >> 2139: if (ft == FLOAT_TYPE::half_precision) { >> 2140: assert_cond(UseZfh); >> 2141: } > > Suggestion: `assert_cond((ft != FLOAT_TYPE::half_precision) || UseZfh);` OK, will fix it. > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 6392: > >> 6390: fmv_h_x(dst, src); >> 6391: fcvt_s_h(dst, dst); >> 6392: j(DONE); > > It looks to me confusing to have pairs like `float16_to_float` and `float16_to_float_c2`. As there is only one use for `float16_to_float` in file `src/hotspot/cpu/riscv/stubGenerator_riscv.cpp`, I would suggest we inline the code in the callsite. Then we could remove this assembler routine and rename `float16_to_float_c2` to `float16_to_float`. Also when inlining the code in the callsite, we could replace this `j(DONE)` with a direct return, thus saving one jump instruction. Good suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1980998175 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1980998037 From tschatzl at openjdk.org Wed Mar 5 09:45:00 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 5 Mar 2025 09:45:00 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v13] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * fix whitespace * additional whitespace between log tags * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/4a978118..a457e6e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=11-12 Stats: 116 lines in 6 files changed: 50 ins; 50 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From mli at openjdk.org Wed Mar 5 09:57:45 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 5 Mar 2025 09:57:45 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v2] In-Reply-To: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: > Hi, > Can you help to review this patch? > It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. > > ## Performance > > data > > Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 > Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 > Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 > Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 > Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 > Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 > Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 > Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.818 | 250.761 | 5191.12 | 64.598 | ns/op | 17.639 > Fl... Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - clean - merge master - merge master - clean 2 - clean - initial commit ------------- Changes: https://git.openjdk.org/jdk/pull/23844/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23844&range=01 Stats: 426 lines in 10 files changed: 384 ins; 0 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/23844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23844/head:pull/23844 PR: https://git.openjdk.org/jdk/pull/23844 From iwalulya at openjdk.org Wed Mar 5 11:12:58 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 5 Mar 2025 11:12:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v12] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 17:20:28 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > ayang review > * renamings > * refactorings src/hotspot/share/gc/g1/g1HeapRegion.hpp line 475: > 473: void hr_clear(bool clear_space); > 474: // Clear the card table corresponding to this region. > 475: void clear_cardtable(); in some places `cardtable()` has been refactored to `card_table` e.g. in G1HeapRegionManager. src/hotspot/share/gc/g1/g1ParScanThreadState.hpp line 67: > 65: > 66: size_t _num_marked_as_dirty_cards; > 67: size_t _num_marked_as_into_cset_cards; Suggestion: size_t _num_cards_marked_dirty; size_t _num_cards_marked_to_cset; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1980117641 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1980145229 From iwalulya at openjdk.org Wed Mar 5 11:12:56 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 5 Mar 2025 11:12:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v13] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 09:45:00 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * fix whitespace > * additional whitespace between log tags > * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename src/hotspot/share/gc/g1/c1/g1BarrierSetC1.cpp line 32: > 30: #include "gc/g1/g1HeapRegion.hpp" > 31: #include "gc/g1/g1ThreadLocalData.hpp" > 32: #include "utilities/macros.hpp" Suggestion: #include "utilities/formatBuffer.hpp" #include "utilities/macros.hpp" to use `err_msg` src/hotspot/share/gc/g1/g1RemSet.cpp line 90: > 88: // contiguous ranges of dirty cards to be scanned. These blocks are converted to actual > 89: // memory ranges and then passed on to actual scanning. > 90: class G1RemSetScanState : public CHeapObj { Need to update the comment above to remove reference to "log buffers" (L:67). src/hotspot/share/gc/g1/g1RemSet.hpp line 44: > 42: class CardTableBarrierSet; > 43: class G1AbstractSubTask; > 44: class G1RemSetScanState; Already declared on line 48 below src/hotspot/share/gc/g1/g1ThreadLocalData.hpp line 29: > 27: #include "gc/g1/g1BarrierSet.hpp" > 28: #include "gc/g1/g1CardTable.hpp" > 29: #include "gc/g1/g1CollectedHeap.hpp" probably does not need to be included ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1981138746 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1981162792 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1981118865 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1981142943 From duke at openjdk.org Wed Mar 5 11:33:06 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 5 Mar 2025 11:33:06 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v3] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merged master. - Added comments, removed debugging printfs - JDK-8351034 Add AVX-512 intrinsics for ML-DSA ------------- Changes: https://git.openjdk.org/jdk/pull/23860/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=02 Stats: 1642 lines in 8 files changed: 1636 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From jbhateja at openjdk.org Wed Mar 5 11:38:52 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 5 Mar 2025 11:38:52 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v2] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 19:00:59 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Added comments, removed debugging printfs src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 420: > 418: __ movptr(constant2use, round_consts); > 419: > 420: __ BIND(rounds24_loop); For Icache alignment, please use __ align64() before the loop entry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1978822704 From jbhateja at openjdk.org Wed Mar 5 11:42:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 5 Mar 2025 11:42:01 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v3] In-Reply-To: References: Message-ID: <0E2AqFpNPjDjP6jqCXn8toePBcW2SIHw1kFXlZX4W_U=.8d692bfa-0598-4969-b480-4a285366e0bb@github.com> On Wed, 5 Mar 2025 11:33:06 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merged master. > - Added comments, removed debugging printfs > - JDK-8351034 Add AVX-512 intrinsics for ML-DSA src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 292: > 290: __ movl(iterations, 2); > 291: > 292: __ BIND(L_loop); Please align loop entry address using __align64(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1981242267 From shade at openjdk.org Wed Mar 5 12:03:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Mar 2025 12:03:52 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v2] In-Reply-To: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: <22O5bysM7g9bWIQOpHWaXQSf3feld_GlkvMTPrCFlUA=.4dc2cd48-b4fc-4611-805e-a16f2ece812d@github.com> > We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. > > This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into JDK-8351142-jfr-deflate-event - Test updates - Rework statistics event to be actually statistics - Filter JFR HiddenWait consistently - Event metadata touchups - Separate statistics event as well - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23900/files - new: https://git.openjdk.org/jdk/pull/23900/files/0cbb9f53..8102aed8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23900&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23900&range=00-01 Stats: 5427 lines in 140 files changed: 3043 ins; 1350 del; 1034 mod Patch: https://git.openjdk.org/jdk/pull/23900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23900/head:pull/23900 PR: https://git.openjdk.org/jdk/pull/23900 From shade at openjdk.org Wed Mar 5 12:03:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Mar 2025 12:03:55 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v2] In-Reply-To: References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: <3WqyBm5tJUCUMCxRmggSQ8gercTV3E4JxaqiplYxNiI=.61eb2a46-ac3a-409b-9820-2210360e7560@github.com> On Tue, 4 Mar 2025 15:26:45 GMT, Erik Gahlin wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8351142-jfr-deflate-event >> - Test updates >> - Rework statistics event to be actually statistics >> - Filter JFR HiddenWait consistently >> - Event metadata touchups >> - Separate statistics event as well >> - Fix > > src/hotspot/share/jfr/metadata/metadata.xml line 124: > >> 122: >> 123: >> 124: > > For consistency with other statistical events, the category for JavaMonitorStatistics should be "Java Application, Statistics" > > The event should probably be periodic, so users can set an interval to reduce the number of events, with a default period of "everyChunk", so it is emitted at least at the beginning and end of a recording. "Java Application, Statistics" is done. Yeah, I think periodic task for this statistics event would be better. It would lose the direct information about the number of deflated monitors, but that could be somewhat inferred from the current monitor counts. See new version. > src/hotspot/share/jfr/metadata/metadata.xml line 125: > >> 123: >> 124: >> 125: > > The label should be 'Monitor in Use' (lowercase 'i'). > > Here is the style guideline if you're wondering. > https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Label.html Noted, thanks! Fixed in new version. > test/jdk/jdk/jfr/event/runtime/TestJavaMonitorDeflateEvent.java line 82: > >> 80: waitThread.join(); >> 81: // Let deflater thread run. >> 82: Thread.sleep(3000); > > I see that you took code from the MonitorInflate test. It's a really old test. A RecordingStream would be a more suitable as you can avoid using Thread.sleep() and the TestThread. I don't think a file needs to be dumped if the events are printed to standard out. Something like this: > > > String lockClassName = lock.getClass().getName(); > List events = new CopyOnWriteArrayList<>(); > try (RecordingStream rs = new RecordingStream()) { > rs.enable(EVENT_NAME).withoutThreshold(); > rs.onEvent(EVENT_NAME, e -> { > RecordedClass clazz = e.getType(FIELD_KLASS_NAME); > if (clazz.getName().equals(lockClassName)) { > rs.close(); > } > }); > rs.startAsync(); > ... > synchronized (lock) { > ... > } > ... > rs.awaitTermination(); > System.out.println(events); > RecordedEvent event = events.get(0); > Events.assertField(event, FIELD_ADDRESS).notEqual(0L); > } Thanks. I rewritten the test using this suggestion as the base! > test/jdk/jdk/jfr/event/runtime/TestJavaMonitorStatisticsEvent.java line 60: > >> 58: Recording recording = new Recording(); >> 59: recording.enable(EVENT_NAME).withThreshold(Duration.ofMillis(0)); >> 60: final Lock lock = new Lock(); > > If the event is periodic, you can set: > > `recording.enable(EVENT_NAME).with("period", "everyChunk");` > > and use the following instead of isAnyFound: > > List events = Events.fromRecording(recording); > Events.hasEvents(events); > > There's no need to dump to failed.jfr. Events.fromRecording will create a file that can be inspected in case the test fails. try-with-resources would be nice to have. Rewritten with `RecordingStream` as well, see new version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23900#discussion_r1981270071 PR Review Comment: https://git.openjdk.org/jdk/pull/23900#discussion_r1981270158 PR Review Comment: https://git.openjdk.org/jdk/pull/23900#discussion_r1981271007 PR Review Comment: https://git.openjdk.org/jdk/pull/23900#discussion_r1981271507 From shade at openjdk.org Wed Mar 5 12:03:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Mar 2025 12:03:56 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v2] In-Reply-To: References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: On Wed, 5 Mar 2025 06:53:48 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8351142-jfr-deflate-event >> - Test updates >> - Rework statistics event to be actually statistics >> - Filter JFR HiddenWait consistently >> - Event metadata touchups >> - Separate statistics event as well >> - Fix > > src/hotspot/share/runtime/objectMonitor.cpp line 663: > >> 661: const oop obj) { >> 662: assert(event != nullptr, "invariant"); >> 663: event->set_monitorClass(obj->klass()); > > Now that I have seen the "hidden wait" logic in the other PR, should inflation/deflation events not also check `is_excluded`? Good question. I think if we filter waits on JFR "hidden waits", the inflate and deflate should filter them as well. This requires a bit of reshuffling, but I think the thing I have in new version works. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23900#discussion_r1981270242 From fyang at openjdk.org Wed Mar 5 12:17:56 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 5 Mar 2025 12:17:56 GMT Subject: RFR: 8351145: RISC-V: only enable some crypto intrinsic when AvoidUnalignedAccess == false In-Reply-To: <8-3lLYr9jtNOQJhRLRyAA2xxfxG2aVm27HIGcbNsCfY=.8e43a66a-5602-4345-b5a0-cfdaab7e0d8f@github.com> References: <8-3lLYr9jtNOQJhRLRyAA2xxfxG2aVm27HIGcbNsCfY=.8e43a66a-5602-4345-b5a0-cfdaab7e0d8f@github.com> Message-ID: On Tue, 4 Mar 2025 16:11:41 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > > Depending whether a cpu supports fast misaligned access or not, the misaligned access can impact the performance a lot. > Some crypto intrinsic implementation on riscv do not consider data alignment and just use `ld` to load input byte array, and seems there is no way to do it, the main reason is that at java API level, the input byte array to these JVM intrinsic could be part of a real java array, so the input byte array could be 1/2...7 byte aligned. > And with the introduction of COH, it would be even complicated to do the input data alignment. > > So, for the consistency of performance, seems it's better to disable these intrinsics when AvoidUnalignedAccess == true. > And the user can still enable the intrinsics explicitly on a CPU with AvoidUnalignedAccess == true if they want so. > > Thanks! Thanks for finding this! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23903#pullrequestreview-2660934134 From shade at openjdk.org Wed Mar 5 12:30:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Mar 2025 12:30:33 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v2] In-Reply-To: References: Message-ID: > We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. > > Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Drop threshold to 0ms - Merge branch 'master' into JDK-8351187-jfr-monitor-notify - Disable by default - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23901/files - new: https://git.openjdk.org/jdk/pull/23901/files/792934ff..bae0c391 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23901&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23901&range=00-01 Stats: 5245 lines in 131 files changed: 2978 ins; 1296 del; 971 mod Patch: https://git.openjdk.org/jdk/pull/23901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23901/head:pull/23901 PR: https://git.openjdk.org/jdk/pull/23901 From shade at openjdk.org Wed Mar 5 12:30:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Mar 2025 12:30:33 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v2] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 06:51:36 GMT, David Holmes wrote: > Again a reasonable idea but note that notifications can occur much more frequently than waits. A wait only happens when a thread actually has to wait, whereas notifications tend to happen whenever a data structure is updated in a key way On one hand, this looks like another reason to keep the event disabled by default. On the other hand, the way the event is currently implemented, it only fires when wait-set is not empty, plus or minus race conditions that would be resolved later in `INotify` after taking the internal lock. So there _should_ be a `wait` waiting for the overwhelming majority of eventful `notify`-es. Meaning, the number of `notify` events should be more or less tracking the number of `wait` events. It is likely the number of `notify` events are less than number of `wait` events even, if we assume most of the robust concurrent code uses `notifyAll()`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23901#issuecomment-2700789087 From fbredberg at openjdk.org Wed Mar 5 12:31:23 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 5 Mar 2025 12:31:23 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v3] In-Reply-To: References: Message-ID: > I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. > > This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. > > In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. > > The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. > > You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. > > The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. > > Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). > > Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. > > However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fact that c2 no longer has to check b... Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: Updated comments after review by Patricio. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23421/files - new: https://git.openjdk.org/jdk/pull/23421/files/283c2431..0d2d6c34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23421&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23421&range=01-02 Stats: 11 lines in 1 file changed: 1 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23421.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23421/head:pull/23421 PR: https://git.openjdk.org/jdk/pull/23421 From fbredberg at openjdk.org Wed Mar 5 12:31:24 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 5 Mar 2025 12:31:24 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> <09Lu69Do9amzXyGok3KDuP2whACShrPwRM7BOel5wgg=.ceed3ba0-9f91-4e95-9cf5-0e85362e29df@github.com> Message-ID: On Wed, 5 Mar 2025 05:14:54 GMT, David Holmes wrote: >> You're quite right. I'll rewrite that section of the comment. Thank you for spotting this. > > Yep my bad - you can't delete yourself without a prev node pointer when you are being pointed to. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1981307190 From shade at openjdk.org Wed Mar 5 12:32:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Mar 2025 12:32:56 GMT Subject: RFR: 8351187: Add JFR monitor notification event In-Reply-To: <2b6qM-t3jmRAx9XW1SdpKVeJBuKRqzFD_FFXQ4Hka5A=.af985e1e-ad56-44cc-b077-05a45ed2bd75@github.com> References: <2b6qM-t3jmRAx9XW1SdpKVeJBuKRqzFD_FFXQ4Hka5A=.af985e1e-ad56-44cc-b077-05a45ed2bd75@github.com> Message-ID: On Wed, 5 Mar 2025 07:26:13 GMT, Alan Bateman wrote: > Do I read this correctly that this is a duration event and will only be recorded if the notify takes >= 10ms? When I read David's comment then I assumed it was an instant event but it seems not. Given the "address" field then I assume this is intended to be part of a troubleshooting recipe, maybe with threshold set to 0ms, is that right? Good catch, copy-paste error, really. I am arguing separately in #23891 that monitor-related JFR events that do not actually block should not have a high threshold. Otherwise we filter most of them in practice. (JFR tests do not see this, because they override the thresholds to 0ms themselves). Notification event is one of those events as well. See new version, where I dropped the threshold to `0ms`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23901#issuecomment-2700796156 From fbredberg at openjdk.org Wed Mar 5 12:34:56 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 5 Mar 2025 12:34:56 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> References: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> Message-ID: On Mon, 3 Mar 2025 23:10:29 GMT, Patricio Chilano Mateo wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after review by David and Coleen. > > src/hotspot/share/runtime/objectMonitor.cpp line 1265: > >> 1263: // that updated _entry_list, so we can access w->_next. >> 1264: w = Atomic::load_acquire(&_entry_list); >> 1265: assert(w != nullptr, "invariant"); > > Maybe add the same assert as below for the single element case: `assert(w->TState == ObjectWaiter::TS_ENTER, "invariant")`. Since this is not strictly necessary, I will look into this in a follow up PR. > src/hotspot/share/runtime/objectMonitor.cpp line 1532: > >> 1530: // Let's say T1 then stalls. T2 acquires O and calls O.notify(). The >> 1531: // notify() operation moves T1 from O's waitset to O's entry_list. T2 then >> 1532: // release the lock "O". T2 resumes immediately after the ST of null into > > Pre-existent, but this should be T1. Same in next sentence. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1981314184 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1981313487 From fbredberg at openjdk.org Wed Mar 5 12:34:57 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 5 Mar 2025 12:34:57 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> Message-ID: On Tue, 4 Mar 2025 18:08:15 GMT, Patricio Chilano Mateo wrote: >> We don't have a prev node, we don't know which node to set next to our next node to. The list will be broken. > > Right, we still have to set the previous links for those nodes. I'm just suggesting we don't have to walk the whole list, just until the last node we set the previous pointer. Since this is not strictly necessary, I will look into this in a follow up PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1981312971 From yzheng at openjdk.org Wed Mar 5 12:37:52 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 5 Mar 2025 12:37:52 GMT Subject: RFR: 8350892: [JVMCI] Align ResolvedJavaType.getInstanceFields with Class.getDeclaredFields In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 23:46:54 GMT, Doug Simon wrote: > The current order of fields returned by `ResolvedJavaType.getInstanceFields` is a) not well specified and b) different than the order of fields used almost everywhere else in HotSpot. This PR aligns the order of `getInstanceFields` with `Class.getDeclaredFields()`. > > It also makes `ciInstanceKlass::_nonstatic_fields` use the same order which unifies how escape analysis and deoptimization treats fields across C2 and JVMCI. Overall looks good to me src/hotspot/share/ci/ciInstanceKlass.cpp line 481: > 479: // Now sort them by offset, ascending. > 480: // (In principle, they could mix with superclass fields.) > 481: fields->sort(sort_field_by_offset); This has no effect now, i.e., the fields were sorted already? ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/23849#pullrequestreview-2660958414 PR Review Comment: https://git.openjdk.org/jdk/pull/23849#discussion_r1981305860 From fbredberg at openjdk.org Wed Mar 5 12:43:03 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 5 Mar 2025 12:43:03 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> References: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> Message-ID: <2pmoWBdeasqGUxjDKvJMIBUqgipo33xTNrYIdB6U1vM=.79067439-b3c1-4b47-8669-7e4d77a22b3f@github.com> On Mon, 3 Mar 2025 23:12:05 GMT, Patricio Chilano Mateo wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after review by David and Coleen. > > src/hotspot/share/runtime/objectMonitor.cpp line 1509: > >> 1507: // is no successor, so it appears that an heir-presumptive >> 1508: // (successor) must be made ready. Only the current lock owner can >> 1509: // detach threads from the entry_list, therefore we need to > > We don't detach threads here, so maybe manipulate would be better. Maybe, but manipulate may also include "pushing to the head", which is fine to do without holding the lock. I'll keep the comment as is for now, maybe this sentence will be deleted if we find a way of running exit without holding the lock, as we have talked about. If that's not possible I'll rephrase this sentence in a follow up PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1981325861 From fbredberg at openjdk.org Wed Mar 5 12:51:02 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 5 Mar 2025 12:51:02 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v3] In-Reply-To: References: Message-ID: <6TKNnpUGSCflszCRIY531Nnf1kMxjlYQm3V4Yf44riY=.5c5f69ef-cf63-4748-902b-39c2898762ee@github.com> On Wed, 5 Mar 2025 12:31:23 GMT, Fredrik Bredberg wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated comments after review by Patricio. @mur47x111 I'm getting ready to integrate. I've seen that you have created [[JDK-8349711] Adapt JDK-8343840: Rewrite the ObjectMonitor lists](https://github.com/oracle/graal/pull/10757) to handle the change on your side. Do you see any reason why I shouldn't integrate, or are you fine with me integrating this PR now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23421#issuecomment-2700837592 From shade at openjdk.org Wed Mar 5 12:55:46 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Mar 2025 12:55:46 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v3] In-Reply-To: References: Message-ID: > We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. > > Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Rewrite test to RecordingStream ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23901/files - new: https://git.openjdk.org/jdk/pull/23901/files/bae0c391..f52144fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23901&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23901&range=01-02 Stats: 72 lines in 1 file changed: 34 ins; 27 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/23901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23901/head:pull/23901 PR: https://git.openjdk.org/jdk/pull/23901 From duke at openjdk.org Wed Mar 5 13:10:34 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 5 Mar 2025 13:10:34 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v4] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Added alignment to loop entries. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23860/files - new: https://git.openjdk.org/jdk/pull/23860/files/331f1ecb..3aaa106f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=02-03 Stats: 9 lines in 2 files changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From duke at openjdk.org Wed Mar 5 13:10:35 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 5 Mar 2025 13:10:35 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v3] In-Reply-To: <0E2AqFpNPjDjP6jqCXn8toePBcW2SIHw1kFXlZX4W_U=.8d692bfa-0598-4969-b480-4a285366e0bb@github.com> References: <0E2AqFpNPjDjP6jqCXn8toePBcW2SIHw1kFXlZX4W_U=.8d692bfa-0598-4969-b480-4a285366e0bb@github.com> Message-ID: On Wed, 5 Mar 2025 11:39:05 GMT, Jatin Bhateja wrote: >> Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Merged master. >> - Added comments, removed debugging printfs >> - JDK-8351034 Add AVX-512 intrinsics for ML-DSA > > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 292: > >> 290: __ movl(iterations, 2); >> 291: >> 292: __ BIND(L_loop); > > Hi @ferakocz , Kindly align loop entry address using __align64() here and at all the places before __BIND(LOOP) Hi, @jatin-bhateja, thanks for the suggestion. I have added __ align(OptoLoopAlignment); before all loop entries. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1981364481 From ayang at openjdk.org Wed Mar 5 13:13:58 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 5 Mar 2025 13:13:58 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v7] In-Reply-To: <00nHUxFecTrb5xshjIqDo40zuqdHNiANMqSNCUH2jGY=.7bd0ea79-d6a9-46f9-86d1-4e1d75a27d69@github.com> References: <00nHUxFecTrb5xshjIqDo40zuqdHNiANMqSNCUH2jGY=.7bd0ea79-d6a9-46f9-86d1-4e1d75a27d69@github.com> Message-ID: On Tue, 4 Mar 2025 15:48:17 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > align comments src/hotspot/share/gc/parallel/psOldGen.cpp line 193: > 191: #endif > 192: const size_t alignment = virtual_space()->alignment(); > 193: size_t aligned_bytes = can_align_up(bytes, alignment) ? align_up(bytes, alignment) : 0; Why doesn't the previous revision using early-return + min2 work? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1981369341 From mli at openjdk.org Wed Mar 5 13:18:51 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 5 Mar 2025 13:18:51 GMT Subject: RFR: 8351145: RISC-V: only enable some crypto intrinsic when AvoidUnalignedAccess == false In-Reply-To: References: <8-3lLYr9jtNOQJhRLRyAA2xxfxG2aVm27HIGcbNsCfY=.8e43a66a-5602-4345-b5a0-cfdaab7e0d8f@github.com> Message-ID: <3LZYl5hasXZbDbXXUppks0WA4RFIBBTb5LNK12LSY8E=.a54612ee-cda9-41a9-b96d-bff2b2279bdf@github.com> On Wed, 5 Mar 2025 12:15:36 GMT, Fei Yang wrote: > Thanks for finding this! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23903#issuecomment-2700902383 From cnorrbin at openjdk.org Wed Mar 5 13:23:54 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 5 Mar 2025 13:23:54 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v7] In-Reply-To: References: <00nHUxFecTrb5xshjIqDo40zuqdHNiANMqSNCUH2jGY=.7bd0ea79-d6a9-46f9-86d1-4e1d75a27d69@github.com> Message-ID: On Wed, 5 Mar 2025 13:11:26 GMT, Albert Mingkun Yang wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> align comments > > src/hotspot/share/gc/parallel/psOldGen.cpp line 193: > >> 191: #endif >> 192: const size_t alignment = virtual_space()->alignment(); >> 193: size_t aligned_bytes = can_align_up(bytes, alignment) ? align_up(bytes, alignment) : 0; > > Why doesn't the previous revision using early-return + min2 work? That solution still works. With `can_align_up` and setting `aligned_bytes` to 0, we can use the logic that is already there a few lines down: https://github.com/openjdk/jdk/blob/caaf4098452476d981183ad4302b76b9c883a72b/src/hotspot/share/gc/parallel/psOldGen.cpp#L201-L207 To me it felt better to continue to try to expand instead of aborting. If you prefer the other version I can swap back to it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1981383636 From ayang at openjdk.org Wed Mar 5 13:44:52 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 5 Mar 2025 13:44:52 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v7] In-Reply-To: References: <00nHUxFecTrb5xshjIqDo40zuqdHNiANMqSNCUH2jGY=.7bd0ea79-d6a9-46f9-86d1-4e1d75a27d69@github.com> Message-ID: <-KMQt_Q5k4WUxfx3PAq4Bd_N53xrADJ7wLI9q2YOa4A=.5c8486eb-4f3a-47c6-96c1-b498c98e12af@github.com> On Wed, 5 Mar 2025 13:21:09 GMT, Casper Norrbin wrote: > try to expand instead of aborting. Did the previous version abort? I thought the logic is essentially: if (remaining == 0) { return; } aligned_bytes = align_up(min2(bytes, remaining), alignment). The below `if (aligned_bytes == 0) { ` should not be reachable any more, since align-up-overflow can't occur. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1981423472 From dnsimon at openjdk.org Wed Mar 5 13:50:53 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 5 Mar 2025 13:50:53 GMT Subject: RFR: 8350892: [JVMCI] Align ResolvedJavaType.getInstanceFields with Class.getDeclaredFields In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 12:26:12 GMT, Yudi Zheng wrote: >> The current order of fields returned by `ResolvedJavaType.getInstanceFields` is a) not well specified and b) different than the order of fields used almost everywhere else in HotSpot. This PR aligns the order of `getInstanceFields` with `Class.getDeclaredFields()`. >> >> It also makes `ciInstanceKlass::_nonstatic_fields` use the same order which unifies how escape analysis and deoptimization treats fields across C2 and JVMCI. > > src/hotspot/share/ci/ciInstanceKlass.cpp line 481: > >> 479: // Now sort them by offset, ascending. >> 480: // (In principle, they could mix with superclass fields.) >> 481: fields->sort(sort_field_by_offset); > > This has no effect now, i.e., the fields were sorted already? They now have whatever sort order is given by JavaFieldStream. This happens to currently be class file declaration order but it doesn't really matter if it changes. The only requirement is that the same order is used by `get_reassigned_fields` in `deoptimization.cpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23849#discussion_r1981441818 From jbhateja at openjdk.org Wed Mar 5 14:05:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 5 Mar 2025 14:05:53 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v3] In-Reply-To: References: <0E2AqFpNPjDjP6jqCXn8toePBcW2SIHw1kFXlZX4W_U=.8d692bfa-0598-4969-b480-4a285366e0bb@github.com> Message-ID: On Wed, 5 Mar 2025 13:07:54 GMT, Ferenc Rakoczi wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 292: >> >>> 290: __ movl(iterations, 2); >>> 291: >>> 292: __ BIND(L_loop); >> >> Hi @ferakocz , Kindly align loop entry address using __align64() here and at all the places before __BIND(LOOP) > > Hi, @jatin-bhateja, thanks for the suggestion. I have added __ align(OptoLoopAlignment); before all loop entries. Hi @ferakocz , Thanks!, for efficient utilization of Decode ICache (please refer to Intel SDM section 3.4.2.5), code blocks should be aligned to 32-byte boundaries; a 64-byte aligned code is a superset of both 16 and 32 byte aligned addresses and also matches with the cacheline size. However, I can noticed that we have been using OptoLoopAlignment at places in AES-GCM also. I introduced some errors in generate_dilithiumAlmostInverseNtt_avx512 implementation in anticipation of catching it through existing ML_DSA_Tests under test/jdk/sun/security/provider/acvp But all the tests passed for me. `java -jar /home/jatinbha/sandboxes/jtreg/build/images/jtreg/lib/jtreg.jar -jdk:$JAVA_HOME -Djdk.test.lib.artifacts.ACVP-Server=/home/jatinbha/softwares/v1.1.0.38.zip -va -timeout:4 Launcher.java` Can you please point out a test I need to use for validation ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1981468903 From cnorrbin at openjdk.org Wed Mar 5 14:08:54 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 5 Mar 2025 14:08:54 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v7] In-Reply-To: <-KMQt_Q5k4WUxfx3PAq4Bd_N53xrADJ7wLI9q2YOa4A=.5c8486eb-4f3a-47c6-96c1-b498c98e12af@github.com> References: <00nHUxFecTrb5xshjIqDo40zuqdHNiANMqSNCUH2jGY=.7bd0ea79-d6a9-46f9-86d1-4e1d75a27d69@github.com> <-KMQt_Q5k4WUxfx3PAq4Bd_N53xrADJ7wLI9q2YOa4A=.5c8486eb-4f3a-47c6-96c1-b498c98e12af@github.com> Message-ID: <_TCggTdPl0cXFfrP-iMphTcJ-EYJwXuP2i1ANSel8Vc=.adaf9243-1196-4b53-a976-4a8d15cde9fa@github.com> On Wed, 5 Mar 2025 13:42:32 GMT, Albert Mingkun Yang wrote: >> That solution still works. With `can_align_up` and setting `aligned_bytes` to 0, we can use the logic that is already there a few lines down: >> >> https://github.com/openjdk/jdk/blob/caaf4098452476d981183ad4302b76b9c883a72b/src/hotspot/share/gc/parallel/psOldGen.cpp#L201-L207 >> >> To me it felt better to continue to try to expand instead of aborting. If you prefer the other version I can swap back to it. > >> try to expand instead of aborting. > > Did the previous version abort? I thought the logic is essentially: > > > if (remaining == 0) { > return; > } > > aligned_bytes = align_up(min2(bytes, remaining), alignment). > > > The below `if (aligned_bytes == 0) { ` should not be reachable any more, since align-up-overflow can't occur. I meant aborting in the sense that we abort the expand operation by returning early instead of continuing. If we don't that, the `if (aligned_bytes == 0) {` should be reachable with `can_align_up` if the align were to fail and we set aligned_bytes to 0. The `remaining == 0` would then be caught later in the function inside `expand_by(aligned_bytes)`. I'll go ahead and revert to the other version, and remove the `if (aligned_bytes == 0) {` block as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1981475286 From cnorrbin at openjdk.org Wed Mar 5 14:13:36 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 5 Mar 2025 14:13:36 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v8] In-Reply-To: References: Message-ID: <4VkKBmAzAASK5lVvj-ABWD-cvqU8dS26gBIMHPbJvLU=.564f068e-a620-40d0-a954-e2f35d73bf1d@github.com> > Hi everyone, > > The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. > > The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: changed psoldgen check back to earlier version ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23711/files - new: https://git.openjdk.org/jdk/pull/23711/files/f52de010..31a7d55e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=06-07 Stats: 12 lines in 1 file changed: 4 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23711.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23711/head:pull/23711 PR: https://git.openjdk.org/jdk/pull/23711 From ayang at openjdk.org Wed Mar 5 14:16:54 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 5 Mar 2025 14:16:54 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v8] In-Reply-To: <4VkKBmAzAASK5lVvj-ABWD-cvqU8dS26gBIMHPbJvLU=.564f068e-a620-40d0-a954-e2f35d73bf1d@github.com> References: <4VkKBmAzAASK5lVvj-ABWD-cvqU8dS26gBIMHPbJvLU=.564f068e-a620-40d0-a954-e2f35d73bf1d@github.com> Message-ID: On Wed, 5 Mar 2025 14:13:36 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > changed psoldgen check back to earlier version The gc part looks good. Thank you. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23711#pullrequestreview-2661255309 From coleenp at openjdk.org Wed Mar 5 14:35:58 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 5 Mar 2025 14:35:58 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v3] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 12:31:23 GMT, Fredrik Bredberg wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated comments after review by Patricio. Marked as reviewed by coleenp (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23421#pullrequestreview-2661313357 From yzheng at openjdk.org Wed Mar 5 14:49:57 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 5 Mar 2025 14:49:57 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v3] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 12:31:23 GMT, Fredrik Bredberg wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated comments after review by Patricio. JVMCI changes look go to me! We are good to go! ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/23421#pullrequestreview-2661358578 From pchilanomate at openjdk.org Wed Mar 5 14:52:57 2025 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 5 Mar 2025 14:52:57 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v3] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 12:31:23 GMT, Fredrik Bredberg wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated comments after review by Patricio. Thanks, looks good. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23421#pullrequestreview-2661362873 From pchilanomate at openjdk.org Wed Mar 5 14:52:59 2025 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 5 Mar 2025 14:52:59 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: <2pmoWBdeasqGUxjDKvJMIBUqgipo33xTNrYIdB6U1vM=.79067439-b3c1-4b47-8669-7e4d77a22b3f@github.com> References: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> <2pmoWBdeasqGUxjDKvJMIBUqgipo33xTNrYIdB6U1vM=.79067439-b3c1-4b47-8669-7e4d77a22b3f@github.com> Message-ID: On Wed, 5 Mar 2025 12:40:41 GMT, Fredrik Bredberg wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 1509: >> >>> 1507: // is no successor, so it appears that an heir-presumptive >>> 1508: // (successor) must be made ready. Only the current lock owner can >>> 1509: // detach threads from the entry_list, therefore we need to >> >> We don't detach threads here, so maybe manipulate would be better. > > Maybe, but manipulate may also include "pushing to the head", which is fine to do without holding the lock. > I'll keep the comment as is for now, maybe this sentence will be deleted if we find a way of running exit without holding the lock, as we have talked about. If that's not possible I'll rephrase this sentence in a follow up PR. You could use the same wording we have in the comment above already just to make it consistent: `manipulate the _entry_list (except for pushing new threads to the head)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1981553339 From fbredberg at openjdk.org Wed Mar 5 14:56:57 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 5 Mar 2025 14:56:57 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: <4POZFfUl_AWAh3K2rV3Uqey0xkYHApoZDjfuw3TVBlA=.4cf1547b-5279-40b4-bef4-4c9775ec1ad8@github.com> <2pmoWBdeasqGUxjDKvJMIBUqgipo33xTNrYIdB6U1vM=.79067439-b3c1-4b47-8669-7e4d77a22b3f@github.com> Message-ID: On Wed, 5 Mar 2025 14:49:01 GMT, Patricio Chilano Mateo wrote: >> Maybe, but manipulate may also include "pushing to the head", which is fine to do without holding the lock. >> I'll keep the comment as is for now, maybe this sentence will be deleted if we find a way of running exit without holding the lock, as we have talked about. If that's not possible I'll rephrase this sentence in a follow up PR. > > You could use the same wording we have in the comment above already just to make it consistent: `manipulate the _entry_list (except for pushing new threads to the head)`. I promise I'll fix that in the follow up PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1981563527 From gziemski at openjdk.org Wed Mar 5 15:32:03 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 5 Mar 2025 15:32:03 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 09:49:41 GMT, Afshin Zafari wrote: > With the `size` parameter there will be no need to traverse/go through the nodes between the base and end of the region. > Tests: > linux-x64-debug, gtest:NMT* and runtime/NMT* Changes requested by gziemski (Reviewer). Changes requested by gziemski (Reviewer). src/hotspot/share/cds/metaspaceShared.cpp line 1475: > 1473: (address)archive_space_rs.base() == base_address, "Sanity"); > 1474: // Register archive space with NMT. > 1475: MemTracker::record_virtual_memory_tag(archive_space_rs.base(), archive_space_rs.size(), mtClassShared); The pattern here is: `something.base(), something.base.size()` instead of doing this over and over again, why can't we just pass `something` to MemTracker::record_virtual_memory_tag() and let it figure out `base` and `size` itself? src/hotspot/share/cds/metaspaceShared.cpp line 1548: > 1546: return nullptr; > 1547: } > 1548: // NMT: fix up the space tags What exactly needs to be fixed here? ------------- PR Review: https://git.openjdk.org/jdk/pull/23770#pullrequestreview-2661498707 PR Review: https://git.openjdk.org/jdk/pull/23770#pullrequestreview-2661515550 PR Review Comment: https://git.openjdk.org/jdk/pull/23770#discussion_r1981647511 PR Review Comment: https://git.openjdk.org/jdk/pull/23770#discussion_r1981635746 From sgehwolf at openjdk.org Wed Mar 5 15:51:54 2025 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 5 Mar 2025 15:51:54 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 15:11:31 GMT, Thomas Fitzsimmons wrote: >> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. >> >> I tested it with: >> >> >> java -Xlog:os+container=trace -version >> >> on: >> >> `Red Hat Enterprise Linux 8 (cgroups v1 only)`: >> _No change in behaviour_ >> >> `Fedora 41 (cgroups v2)`: >> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ >> >> --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 >> +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 >> @@ -1,7 +1,12 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> +[debug][os,container] v2 controller cpuset is enabled and relevant >> +[debug][os,container] v2 controller cpu is enabled and required >> +[debug][os,container] v2 controller io is enabled but not relevant >> +[debug][os,container] v2 controller memory is enabled and required >> +[debug][os,container] v2 controller hugetlb is enabled but not relevant >> +[debug][os,container] v2 controller pids is enabled and relevant >> +[debug][os,container] v2 controller rdma is enabled but not relevant >> +[debug][os,container] v2 controller misc is enabled but not relevant >> [debug][os,container] Detected cgroups v2 unified hierarchy >> [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope >> [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max >> >> >> `Fedora 41 (custom kernel with cgroups v1 disabled)`: >> _Fixes `cgroups v2` detection:_ >> >> --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 >> +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 >> @@ -1,7 +1,63 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> -[debug][os,container] controller memory is not enabled >> - ] >> -[debug][os,container] One or more required controllers disabled at kernel level. >> +[... > > I fixed an existing assert message typo that I noticed while working on the patch, `hierarchy mismatch for cpuacc[t]`. Strictly speaking it is not related to either bug report, but I figured it did not warrant a bug report of its own. @fitzsim Could you please merge latest master? https://bugs.openjdk.org/browse/JDK-8343191 got merged since. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2701346527 From jsjolen at openjdk.org Wed Mar 5 16:26:17 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 5 Mar 2025 16:26:17 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v34] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Tue, 4 Mar 2025 11:20:05 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > test cases for doing reserve or commit the same region twice. Still working through the files, a few more comments. src/hotspot/share/nmt/memReporter.cpp line 451: > 449: }); > 450: > 451: if (reserved_and_committed) Missing braces src/hotspot/share/nmt/regionsTree.hpp line 37: > 35: // for processing the tree nodes in a shorter and more meaningful way. > 36: class RegionsTree : public VMATree { > 37: private: Remote private, not needed. src/hotspot/share/nmt/regionsTree.hpp line 56: > 54: NodeHelper() : _node(nullptr) { } > 55: NodeHelper(Node* node) : _node(node) { } > 56: inline bool is_valid() { return _node != nullptr; } Missing `const` src/hotspot/share/nmt/regionsTree.inline.hpp line 33: > 31: void RegionsTree::visit_committed_regions(const ReservedMemoryRegion& rgn, F func) { > 32: position start = (position)rgn.base(); > 33: size_t end = (size_t)rgn.end() + 1; Can we `static_cast(rgn.end())` instead? src/hotspot/share/nmt/virtualMemoryTracker.cpp line 60: > 58: if (tracker == nullptr) return false; > 59: _tracker = new (tracker) VirtualMemoryTracker(level == NMT_detail); > 60: return _tracker->tree() != nullptr; @afshin-zafari , `_tracker->tree()` can never be null anymore. In the future we should do a PR where we change it to return a reference. ------------- PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2661644343 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1981728394 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1981743711 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1981745212 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1981747136 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1981755525 From shade at openjdk.org Wed Mar 5 16:55:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Mar 2025 16:55:25 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port Message-ID: This PR implements JEP 503: Remove the 32-bit x86 Port. The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. Additional testing: - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) ------------- Commit messages: - Generic 32-bit x86 configure error supercedes Windows 32-bit x86 - 8345169: Implement JEP 503: Remove the 32-bit x86 Port Changes: https://git.openjdk.org/jdk/pull/23906/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23906&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345169 Stats: 30068 lines in 26 files changed: 4 ins; 30054 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23906.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23906/head:pull/23906 PR: https://git.openjdk.org/jdk/pull/23906 From vlivanov at openjdk.org Wed Mar 5 17:19:13 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 5 Mar 2025 17:19:13 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:52:16 GMT, Aleksey Shipilev wrote: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) Hotspot changes look good to me. I fully support removing x86-32-specific files first and then clean up x86-32-specific code in x86-specific and shared files (e.g., guarded by `#ifndef _LP64`). ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23906#pullrequestreview-2661836831 From duke at openjdk.org Wed Mar 5 17:45:26 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Wed, 5 Mar 2025 17:45:26 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: > This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. > > I tested it with: > > > java -Xlog:os+container=trace -version > > on: > > `Red Hat Enterprise Linux 8 (cgroups v1 only)`: > _No change in behaviour_ > > `Fedora 41 (cgroups v2)`: > _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ > > --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 > +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 > @@ -1,7 +1,12 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 controller cpu is enabled and required > +[debug][os,container] v2 controller io is enabled but not relevant > +[debug][os,container] v2 controller memory is enabled and required > +[debug][os,container] v2 controller hugetlb is enabled but not relevant > +[debug][os,container] v2 controller pids is enabled and relevant > +[debug][os,container] v2 controller rdma is enabled but not relevant > +[debug][os,container] v2 controller misc is enabled but not relevant > [debug][os,container] Detected cgroups v2 unified hierarchy > [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope > [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max > > > `Fedora 41 (custom kernel with cgroups v1 disabled)`: > _Fixes `cgroups v2` detection:_ > > --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 > +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 > @@ -1,7 +1,63 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > -[debug][os,container] controller memory is not enabled > - ] > -[debug][os,container] One or more required controllers disabled at kernel level. > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 contro... Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing Remove from cgroups v1 branch incorrect log messages about cpuset controller being optional. Add test case for cgroups v1, cpuset disabled. - Improve !cgroups_v2_enabled branch comment - Debug-log optional and disabled cgroups v2 controllers Do not log enabled controllers that are not relevant to the JDK. - Move index declaration to scope in which it is used - Remove empty string check during cgroup.controllers parsing - Define ISSPACE_CHARS macro, use it in strsep call - Pass fgets result to strsep - Replace is_cgroupsV2 with cgroups_v2_enabled Also fix the testCgroupv1SystemdOnly and testCgroupv1NoMounts test cases such that their /proc/cgroups and /proc/self/cgroup contents correspond. This prevents assertion failures these tests were producing when is_cgroupsV2 was replaced with cgroups_v2_enabled. - ... and 3 more: https://git.openjdk.org/jdk/compare/94d10e34...b6926e15 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23811/files - new: https://git.openjdk.org/jdk/pull/23811/files/5d7eab52..b6926e15 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23811&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23811&range=02-03 Stats: 23855 lines in 706 files changed: 10308 ins; 9725 del; 3822 mod Patch: https://git.openjdk.org/jdk/pull/23811.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23811/head:pull/23811 PR: https://git.openjdk.org/jdk/pull/23811 From shade at openjdk.org Wed Mar 5 17:48:09 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Mar 2025 17:48:09 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v5] In-Reply-To: References: <2ZFtKLn2EcbzjKQ_USb3yiOWEWQJYocFwj_rk-5h0Jg=.f4eec566-3e0c-4a75-8c27-2cb785b0081a@github.com> Message-ID: On Wed, 5 Mar 2025 00:55:13 GMT, Cesar Soares Lucas wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 258: >> >>> 256: if (ShenandoahCardBarrier) { >>> 257: ShenandoahThreadLocalData::set_card_table(Thread::current(), bs->card_table()->write_byte_map_base()); >>> 258: } >> >> Er. This sets up card table for VMThread, right? I am surprised we do not need this for other fields in `ShenandoahThreadLocalData`. > > Yes, that's for the VMThread. That seems like a good question. I Actually, I am wondering why this is needed. It looks to me VMThread attaches after heap initialization, and the normal `ShenandoahBarrierSet::on_thread_attach` should handle it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1981887605 From shade at openjdk.org Wed Mar 5 17:48:08 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Mar 2025 17:48:08 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v7] In-Reply-To: References: Message-ID: <_LIv8Ggp3ukK0HmhknyG_Mz2x5OKs63Y-qSXTQo9Gdo=.9efc86f1-6cc4-425b-9319-5e1500eb59da@github.com> On Wed, 5 Mar 2025 01:14:44 GMT, Cesar Soares Lucas wrote: >> In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. >> >> The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. >> >> The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. >> >> Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. >> >> The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. >> >> Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. >> >> Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Fix merge conflict > - Address PR feedback: formatting. > - Revert changes to shared cardTable.hpp > - Revert changes to shared cardTable.hpp > - Fix merge conflict > - Address PR feedback: no changes to shared files. > - Merge master > - Addressing PR comments: some refactorings, ppc fix, off-by-one fix. > - Relocation of Card Tables src/hotspot/os_cpu/linux_arm/javaThread_linux_arm.cpp line 43: > 41: > 42: void JavaThread::cache_global_variables() { > 43: #if INCLUDE_SHENANDOAHGC Sounds like we want to be consistent between C1 and C2 code, so maybe we should inject in adjacent block as: if (bs->is_a(BarrierSet::CardTableBarrierSet) && !bs->is_a(BarrierSet::ShenandoahBarrierSet)) { ... src/hotspot/share/gc/shenandoah/shenandoahCardTable.cpp line 57: > 55: _byte_map_base = _byte_map - (uintptr_t(low_bound) >> _card_shift); > 56: assert(byte_for(low_bound) == &_byte_map[0], "Checking start of map"); > 57: assert(byte_for(high_bound-1) <= &_byte_map[last_valid_index()], "Checking end of map"); It is a bit sad to see these asserts go. Is this because `_byte_map` is now mutable? May I suggest doing something like: _write_byte_map = (CardValue*) write_space.base(); _write_byte_map_base = _byte_map - (uintptr_t(low_bound) >> _card_shift); ...later... _read_byte_map = (CardValue*) read_space.base(); _read_byte_map_base = _byte_map - (uintptr_t(low_bound) >> _card_shift); ...later... // Set up current byte map _byte_map = _write_byte_map; _byte_map_base = _write_byte_map_base; // Check one side is good assert(byte_for(low_bound) == &_byte_map[0], "Checking start of map"); assert(byte_for(high_bound-1) <= &_byte_map[last_valid_index()], "Checking end of map"); swap_read_and_write_tables(); // Check another side is good assert(byte_for(low_bound) == &_byte_map[0], "Checking start of map"); assert(byte_for(high_bound-1) <= &_byte_map[last_valid_index()], "Checking end of map"); swap_read_and_write_tables(); src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 638: > 636: CardTable::CardValue* new_ptr; > 637: SwapTLSCardTable(CardTable::CardValue* np) { > 638: this->new_ptr = np; Suggestion: CardTable::CardValue* const _new_ptr; SwapTLSCardTable(CardTable::CardValue* np) : _new_ptr(np) {} ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1981872217 PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1981869962 PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1981835070 From duke at openjdk.org Wed Mar 5 18:30:03 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 5 Mar 2025 18:30:03 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v3] In-Reply-To: References: <0E2AqFpNPjDjP6jqCXn8toePBcW2SIHw1kFXlZX4W_U=.8d692bfa-0598-4969-b480-4a285366e0bb@github.com> Message-ID: On Wed, 5 Mar 2025 14:03:00 GMT, Jatin Bhateja wrote: >> Hi, @jatin-bhateja, thanks for the suggestion. I have added __ align(OptoLoopAlignment); before all loop entries. > > Hi @ferakocz , > > Thanks!, for efficient utilization of Decode ICache (please refer to Intel SDM section 3.4.2.5), code blocks should be aligned to 32-byte boundaries; a 64-byte aligned code is a superset of both 16 and 32 byte aligned addresses and also matches with the cacheline size. However, I can noticed that we have been using OptoLoopAlignment at places in AES-GCM also. > > I introduced some errors in generate_dilithiumAlmostInverseNtt_avx512 implementation in anticipation of catching it through existing ML_DSA_Tests under > test/jdk/sun/security/provider/acvp > > But all the tests passed for me. > `java -jar /home/jatinbha/sandboxes/jtreg/build/images/jtreg/lib/jtreg.jar -jdk:$JAVA_HOME -Djdk.test.lib.artifacts.ACVP-Server=/home/jatinbha/softwares/v1.1.0.38.zip -va -timeout:4 Launcher.java` > > Can you please point out a test I need to use for validation I think the easiest is to put a for (int i = 0; i < 1000; i++) loop around the switch statement in the run() method of the ML_DSA_Test class (test/jdk/sun/security/provider/acvp/ML_DSA_Test.java). (This is because the intrinsics kick in after a few thousand calls of the method.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1981945490 From eastig at amazon.co.uk Wed Mar 5 18:41:42 2025 From: eastig at amazon.co.uk (Astigeevich, Evgeny) Date: Wed, 5 Mar 2025 18:41:42 +0000 Subject: RFD: Grouping hot code in CodeCache Message-ID: <1B0C3138-761B-4DB0-8A98-977C6FC40178@amazon.co.uk> Hi Vladimir, This is JDK-8326205: Implement grouping hot nmethods in CodeCache. As I managed to synthesize a benchmark (https://github.com/openjdk/jdk/pull/23831) to demonstrate performance impact of sparse code, I?d like to discuss a possible solution of the sparse code. High level, a solution is: * Detect hot code. * Group hot code. * Maintain grouped code. Downstream we tried two approaches: * Static lists of methods (compile command): Identify frequently used (hot) methods using test runs and provide static method lists to JVM in production. When JVM compiles a Java method and the method is on the list, JVM puts the code into to a designated code heap (HotCodeHeap). * Dynamic lists of methods (compiler directives): Profile an application in production and dynamically relocate identified hot methods to HotCodeHeap. Relocation was implemented with recompilation. The main advantage of static lists is zero profiling overhead in production. We do all profiling and analysis in test runs. Its problems are: * Training Run Accuracy: We need training runs to have execution paths closely mimicking production environments. Otherwise we put wrong methods into HotCodeHeap. * Method List Maintenance: We need to rerun training to regenerate lists when application code changes. Training runs are expensive and time-consuming. They require long runs to guarantee we see all major execution paths. Updating lists in production can be as complex as application deployment * Method Placement Limitations: Methods marked for HotCodeHeap are permanently placed into HotCodeHeap. No mechanism to remove methods that become less frequently used. We addressed these problems with dynamic lists of methods. We implemented a Java agent that runs within the same JVM to dynamically detect and manage hot Java methods without prior method identification. The agent detects hot methods using JFR. The agent manages hot Java methods in HotCodeHeap with compiler directives. A new compiler directive marks methods with dynamic states ("hot" or "cold"). Methods marked by the ?hot? state are recompiled and placed in HotCodeHeap. Methods marked by the ?cold? state are eventually removed from HotCodeHeap. Problems of this approach are: * It requires specific, complex modifications to compiler directive support: recompilation of Java methods affected by compiler directives changes. This functionality is unique to Java agent implementation and has limited potential for broader use. * The agent cannot guarantee Java methods are moved to/removed from the HotCodeHeap because updates of compiler directives can fail. * The agent knows nothing about compiled code, e.g. whether it?s C1 or C2 compiled, code size, profile. This data can useful for deciding to move or not to move to HotCodeHeap. * Recompilations, especially C2, are expensive. Having many of them can cause performance issues. Also recompiled code might differ from the code we have detected as ?hot?. Running these two approaches in production we learned: * We detect 95% of actively used methods withing the first 30 minutes of an application run. This is with JFR profiling configured: 90 seconds session duration, sampling each 11 ms, 8 minutes between profiling sessions. We can find actively used methods faster if we reduce a pause between profiling sessions and sampling period. However it will increase the profiling overhead and affect application performance. With the current configuration, the profiling overhead is between 1% - 2%. * A set of actively used methods gets into the steady state (no new methods added to, no methods removed from) within the first 60 minutes. * Static lists, when created from runs close to production, have 80% - 90% methods always in use. This does not change over time. * Predicting the size of HotCodeHeap is difficult, especially with dynamic lists. We want to have grouping of hot method functionality as a part Hotspot JVM. We will group only C2 compiled methods. We can group JVMCI compiled methods, e.g. Graal, if needed. We need profiling precise enough to detect major Java methods. Low overhead is more important than precision. We think we can have a solution which does not require a lot of code: * Detect hot code: we can an implementation based on the Sweeper: https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/runtime/sweeper.hpp. We will use the handshakes mechanism, what the Sweeper used, to detect nmethods on the top of thread stacks. * Group hot code: we have a draft PR https://github.com/openjdk/jdk/pull/23573. It implements relocation of nmethods within CodeCache. * Maintain grouped code: we will add an additional code heap where hot nmethods will be relocated to. What do you think about this approach? Are there other possible solutions? Thanks, Evgeny A. Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cslucas at openjdk.org Wed Mar 5 18:49:05 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 5 Mar 2025 18:49:05 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v7] In-Reply-To: <_LIv8Ggp3ukK0HmhknyG_Mz2x5OKs63Y-qSXTQo9Gdo=.9efc86f1-6cc4-425b-9319-5e1500eb59da@github.com> References: <_LIv8Ggp3ukK0HmhknyG_Mz2x5OKs63Y-qSXTQo9Gdo=.9efc86f1-6cc4-425b-9319-5e1500eb59da@github.com> Message-ID: On Wed, 5 Mar 2025 17:32:30 GMT, Aleksey Shipilev wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - Fix merge conflict >> - Address PR feedback: formatting. >> - Revert changes to shared cardTable.hpp >> - Revert changes to shared cardTable.hpp >> - Fix merge conflict >> - Address PR feedback: no changes to shared files. >> - Merge master >> - Addressing PR comments: some refactorings, ppc fix, off-by-one fix. >> - Relocation of Card Tables > > src/hotspot/share/gc/shenandoah/shenandoahCardTable.cpp line 57: > >> 55: _byte_map_base = _byte_map - (uintptr_t(low_bound) >> _card_shift); >> 56: assert(byte_for(low_bound) == &_byte_map[0], "Checking start of map"); >> 57: assert(byte_for(high_bound-1) <= &_byte_map[last_valid_index()], "Checking end of map"); > > It is a bit sad to see these asserts go. Is this because `_byte_map` is now mutable? May I suggest doing something like: > > > _write_byte_map = (CardValue*) write_space.base(); > _write_byte_map_base = _byte_map - (uintptr_t(low_bound) >> _card_shift); > ...later... > _read_byte_map = (CardValue*) read_space.base(); > _read_byte_map_base = _byte_map - (uintptr_t(low_bound) >> _card_shift); > ...later... > > // Set up current byte map > _byte_map = _write_byte_map; > _byte_map_base = _write_byte_map_base; > > // Check one side is good > assert(byte_for(low_bound) == &_byte_map[0], "Checking start of map"); > assert(byte_for(high_bound-1) <= &_byte_map[last_valid_index()], "Checking end of map"); > swap_read_and_write_tables(); > > // Check another side is good > assert(byte_for(low_bound) == &_byte_map[0], "Checking start of map"); > assert(byte_for(high_bound-1) <= &_byte_map[last_valid_index()], "Checking end of map"); > swap_read_and_write_tables(); Yeah, I didn't like that either. If I recall correctly I had to remove them because part of the expressions ended up calling `byte_map(_base)` which would come from `ThreadLocalData` which wasn't set at the time `initialize()` was being called. Now that we don't have the virtual methods anymore I think I can put back the asserts. I'll try+test that and get back to you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1981983462 From lmesnik at openjdk.org Wed Mar 5 19:36:08 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 5 Mar 2025 19:36:08 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v3] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 12:55:46 GMT, Aleksey Shipilev wrote: >> We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. >> >> Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Rewrite test to RecordingStream Test changes looks good. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23901#pullrequestreview-2662297970 From lmesnik at openjdk.org Wed Mar 5 19:37:54 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 5 Mar 2025 19:37:54 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v2] In-Reply-To: <22O5bysM7g9bWIQOpHWaXQSf3feld_GlkvMTPrCFlUA=.4dc2cd48-b4fc-4611-805e-a16f2ece812d@github.com> References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> <22O5bysM7g9bWIQOpHWaXQSf3feld_GlkvMTPrCFlUA=.4dc2cd48-b4fc-4611-805e-a16f2ece812d@github.com> Message-ID: On Wed, 5 Mar 2025 12:03:52 GMT, Aleksey Shipilev wrote: >> We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. >> >> This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8351142-jfr-deflate-event > - Test updates > - Rework statistics event to be actually statistics > - Filter JFR HiddenWait consistently > - Event metadata touchups > - Separate statistics event as well > - Fix Test changes looks good. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23900#pullrequestreview-2662300306 From kbarrett at openjdk.org Wed Mar 5 19:58:59 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 5 Mar 2025 19:58:59 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v8] In-Reply-To: <4VkKBmAzAASK5lVvj-ABWD-cvqU8dS26gBIMHPbJvLU=.564f068e-a620-40d0-a954-e2f35d73bf1d@github.com> References: <4VkKBmAzAASK5lVvj-ABWD-cvqU8dS26gBIMHPbJvLU=.564f068e-a620-40d0-a954-e2f35d73bf1d@github.com> Message-ID: On Wed, 5 Mar 2025 14:13:36 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > changed psoldgen check back to earlier version Changes requested by kbarrett (Reviewer). src/hotspot/share/runtime/globals.hpp line 1423: > 1421: product(size_t, MinHeapDeltaBytes, ScaleForWordSize(128*K), \ > 1422: "The minimum change in heap space due to GC (in bytes)") \ > 1423: range(0, max_uintx / 2 + 1) \ Since the option type is `size_t` the range should have s/max_uintx/SIZE_MAX/. Though there are several other similar mismatches nearby. Perhaps leave tidying this up to a followup change? src/hotspot/share/utilities/align.hpp line 80: > 78: > 79: template > 80: inline bool can_align_up(T* ptr, A alignment) { Maybe this should be later in the file, near the other pointer variants? Also, there's no need to parameterize the pointer type - just void* suffices. ------------- PR Review: https://git.openjdk.org/jdk/pull/23711#pullrequestreview-2662324670 PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1982089784 PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1982096331 From egahlin at openjdk.org Wed Mar 5 20:06:55 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 5 Mar 2025 20:06:55 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v2] In-Reply-To: <22O5bysM7g9bWIQOpHWaXQSf3feld_GlkvMTPrCFlUA=.4dc2cd48-b4fc-4611-805e-a16f2ece812d@github.com> References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> <22O5bysM7g9bWIQOpHWaXQSf3feld_GlkvMTPrCFlUA=.4dc2cd48-b4fc-4611-805e-a16f2ece812d@github.com> Message-ID: On Wed, 5 Mar 2025 12:03:52 GMT, Aleksey Shipilev wrote: >> We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. >> >> This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8351142-jfr-deflate-event > - Test updates > - Rework statistics event to be actually statistics > - Filter JFR HiddenWait consistently > - Event metadata touchups > - Separate statistics event as well > - Fix Looks good. ------------- Marked as reviewed by egahlin (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23900#pullrequestreview-2662364529 From kvn at openjdk.org Wed Mar 5 20:10:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 5 Mar 2025 20:10:53 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:52:16 GMT, Aleksey Shipilev wrote: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) Good. So it will be stacked PRs which you will combine for final integration? ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23906#pullrequestreview-2662377172 From jsikstro at openjdk.org Wed Mar 5 20:13:25 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 5 Mar 2025 20:13:25 GMT Subject: RFR: 8351216: ZGC: Store NUMA node count Message-ID: To avoid calling into `os::Linux::max_numa_node()` and in turn libnuma on every count lookup, I propose we instead store the count statically inside ZNUMA. This is perfectly fine since the value that we get from libnuma is configured once during initialization and never change during runtime. The count is set during platform dependent initialization and the getter is now defined in the common code in ZNUMA.cpp. On operating systems that ZGC does not support NUMA for (BSD and Windows) we keep the current behavior by setting the count to 1. This is also preparation work for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)). Testing: * Tiers 1-3 * GHA * Verify that the count is set on a Linux system with NUMA hardware ------------- Commit messages: - 8351216: Store NUMA node count Changes: https://git.openjdk.org/jdk/pull/23922/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23922&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351216 Stats: 28 lines in 7 files changed: 8 ins; 13 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23922.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23922/head:pull/23922 PR: https://git.openjdk.org/jdk/pull/23922 From kvn at openjdk.org Wed Mar 5 20:14:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 5 Mar 2025 20:14:53 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: <8UpKLmwCBMscNGtKyktL_h1aBYo6uzB3kYJOWeJIugA=.78c737ec-e212-4458-a009-79867ad260e5@github.com> On Tue, 4 Mar 2025 16:52:16 GMT, Aleksey Shipilev wrote: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) This is confusing. This PR is part of changes so it can't be "Implement JEP 503: Remove the 32-bit x86 Port" and should be subtask of Umbrella RFE. Am I missing something? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2701962563 From duke at openjdk.org Wed Mar 5 20:46:57 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Wed, 5 Mar 2025 20:46:57 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 17:45:26 GMT, Thomas Fitzsimmons wrote: >> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. >> >> I tested it with: >> >> >> java -Xlog:os+container=trace -version >> >> on: >> >> `Red Hat Enterprise Linux 8 (cgroups v1 only)`: >> _No change in behaviour_ >> >> `Fedora 41 (cgroups v2)`: >> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ >> >> --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 >> +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 >> @@ -1,7 +1,12 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> +[debug][os,container] v2 controller cpuset is enabled and relevant >> +[debug][os,container] v2 controller cpu is enabled and required >> +[debug][os,container] v2 controller io is enabled but not relevant >> +[debug][os,container] v2 controller memory is enabled and required >> +[debug][os,container] v2 controller hugetlb is enabled but not relevant >> +[debug][os,container] v2 controller pids is enabled and relevant >> +[debug][os,container] v2 controller rdma is enabled but not relevant >> +[debug][os,container] v2 controller misc is enabled but not relevant >> [debug][os,container] Detected cgroups v2 unified hierarchy >> [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope >> [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max >> >> >> `Fedora 41 (custom kernel with cgroups v1 disabled)`: >> _Fixes `cgroups v2` detection:_ >> >> --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 >> +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 >> @@ -1,7 +1,63 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> -[debug][os,container] controller memory is not enabled >> - ] >> -[debug][os,container] One or more required controllers disabled at kernel level. >> +[... > > Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 > - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent > - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing > > Remove from cgroups v1 branch incorrect log messages about cpuset > controller being optional. Add test case for cgroups v1, cpuset > disabled. > - Improve !cgroups_v2_enabled branch comment > - Debug-log optional and disabled cgroups v2 controllers > > Do not log enabled controllers that are not relevant to the JDK. > - Move index declaration to scope in which it is used > - Remove empty string check during cgroup.controllers parsing > - Define ISSPACE_CHARS macro, use it in strsep call > - Pass fgets result to strsep > - Replace is_cgroupsV2 with cgroups_v2_enabled > > Also fix the testCgroupv1SystemdOnly and testCgroupv1NoMounts test > cases such that their /proc/cgroups and /proc/self/cgroup contents > correspond. This prevents assertion failures these tests were > producing when is_cgroupsV2 was replaced with cgroups_v2_enabled. > - ... and 3 more: https://git.openjdk.org/jdk/compare/255d679b...b6926e15 The merge is complete. The fix for https://bugs.openjdk.org/browse/JDK-8343191 did not modify any of the same files, so there were no merge conflicts. My test cases all produce similar results; the two new `test_cgroupSubsystem_linux.cpp` tests succeed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2702029227 From cnorrbin at openjdk.org Wed Mar 5 21:26:08 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 5 Mar 2025 21:26:08 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v9] In-Reply-To: References: Message-ID: > Hi everyone, > > The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. > > The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: removed template paramter and moved ptr can_align_up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23711/files - new: https://git.openjdk.org/jdk/pull/23711/files/31a7d55e..0933d3c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=07-08 Stats: 12 lines in 1 file changed: 6 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23711.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23711/head:pull/23711 PR: https://git.openjdk.org/jdk/pull/23711 From cnorrbin at openjdk.org Wed Mar 5 21:26:08 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 5 Mar 2025 21:26:08 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v8] In-Reply-To: References: <4VkKBmAzAASK5lVvj-ABWD-cvqU8dS26gBIMHPbJvLU=.564f068e-a620-40d0-a954-e2f35d73bf1d@github.com> Message-ID: On Wed, 5 Mar 2025 19:50:51 GMT, Kim Barrett wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> changed psoldgen check back to earlier version > > src/hotspot/share/utilities/align.hpp line 80: > >> 78: >> 79: template >> 80: inline bool can_align_up(T* ptr, A alignment) { > > Maybe this should be later in the file, near the other pointer variants? > Also, there's no need to parameterize the pointer type - just void* suffices. Moved it down just above the pointer-`align_up`, and removed the pointer template. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1982203597 From cnorrbin at openjdk.org Wed Mar 5 21:33:55 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 5 Mar 2025 21:33:55 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v8] In-Reply-To: References: <4VkKBmAzAASK5lVvj-ABWD-cvqU8dS26gBIMHPbJvLU=.564f068e-a620-40d0-a954-e2f35d73bf1d@github.com> Message-ID: <874KNcgUkHx7P41XCl5tsXshkQjDcNyzOBv816RErG0=.b887b828-e55b-4092-b170-b53524b31e6b@github.com> On Wed, 5 Mar 2025 19:45:52 GMT, Kim Barrett wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> changed psoldgen check back to earlier version > > src/hotspot/share/runtime/globals.hpp line 1423: > >> 1421: product(size_t, MinHeapDeltaBytes, ScaleForWordSize(128*K), \ >> 1422: "The minimum change in heap space due to GC (in bytes)") \ >> 1423: range(0, max_uintx / 2 + 1) \ > > Since the option type is `size_t` the range should have s/max_uintx/SIZE_MAX/. > Though there are several other similar mismatches nearby. Perhaps leave tidying this > up to a followup change? Think it might be better to do them all at once in a followup. Right now, all `size_t` flags in the file use `max_uintx` for the max value instead of `SIZE_MAX`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1982213032 From lmesnik at openjdk.org Wed Mar 5 21:52:14 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 5 Mar 2025 21:52:14 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 08:43:04 GMT, David Linus Briemann wrote: > 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms Changes requested by lmesnik (Reviewer). test/hotspot/jtreg/runtime/interpreter/CountBytecodesTest.java line 32: > 30: * does not overflow for more than 2^32 bytecodes counted. > 31: * @library /test/lib > 32: * @run main/othervm/timeout=300 CountBytecodesTest The long tests should be excluded from tier1. Please update TEST.groups. ------------- PR Review: https://git.openjdk.org/jdk/pull/23766#pullrequestreview-2662309083 PR Review Comment: https://git.openjdk.org/jdk/pull/23766#discussion_r1982081409 From duke at openjdk.org Wed Mar 5 21:52:14 2025 From: duke at openjdk.org (David Linus Briemann) Date: Wed, 5 Mar 2025 21:52:14 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms Message-ID: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms ------------- Commit messages: - remove auto included header - fix x86 asm - address review comment, add back comma to copyright header - formatting - remove bad header - add missing comma to copyright header - speed up runtime by running less bytecodes, add explanation - add copyright header and @bug number - finish CountBytecodesTest - fix some print directives, add test 1st version - ... and 3 more: https://git.openjdk.org/jdk/compare/e1d0a9c8...45699ec5 Changes: https://git.openjdk.org/jdk/pull/23766/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23766&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350642 Stats: 108 lines in 11 files changed: 86 ins; 0 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/23766.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23766/head:pull/23766 PR: https://git.openjdk.org/jdk/pull/23766 From mdoerr at openjdk.org Wed Mar 5 21:52:14 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 5 Mar 2025 21:52:14 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms In-Reply-To: References: Message-ID: <-K3_uSLZJyXC9ofe3YnqgrreDy7f7nmmXvDUT911_F8=.c5e38fc8-49da-476d-821c-7b2c50c0065d@github.com> On Tue, 25 Feb 2025 08:43:04 GMT, David Linus Briemann wrote: > 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms @RealFYang, @offamitkumar: You may want to test and review this PR on your platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23766#issuecomment-2682082731 From duke at openjdk.org Wed Mar 5 21:52:31 2025 From: duke at openjdk.org (David Linus Briemann) Date: Wed, 5 Mar 2025 21:52:31 GMT Subject: RFR: 8350266: [PPC64] Interpreter: intrinsify Thread.currentThread() Message-ID: Implementation of intrinsic Thread.currentThread() for PPC64. ------------- Commit messages: - implement generate_currentThread for PPC Changes: https://git.openjdk.org/jdk/pull/23677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23677&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350266 Stats: 14 lines in 1 file changed: 13 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23677/head:pull/23677 PR: https://git.openjdk.org/jdk/pull/23677 From mdoerr at openjdk.org Wed Mar 5 21:52:31 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 5 Mar 2025 21:52:31 GMT Subject: RFR: 8350266: [PPC64] Interpreter: intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 13:35:51 GMT, David Linus Briemann wrote: > Implementation of intrinsic Thread.currentThread() for PPC64. LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23677#pullrequestreview-2633454524 From amitkumar at openjdk.org Wed Mar 5 21:52:14 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 5 Mar 2025 21:52:14 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 08:43:04 GMT, David Linus Briemann wrote: > 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms I don't see any failure on s390x as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23766#issuecomment-2683771889 From myankelevich at openjdk.org Wed Mar 5 21:52:15 2025 From: myankelevich at openjdk.org (Mikhail Yankelevich) Date: Wed, 5 Mar 2025 21:52:15 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 08:43:04 GMT, David Linus Briemann wrote: > 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms test/hotspot/jtreg/runtime/interpreter/CountBytecodesTest.java line 65: > 63: } else { > 64: ProcessBuilder pb = ProcessTools.createTestJavaProcessBuilder("-Xint", "-XX:+CountBytecodes", "CountBytecodesTest", "test"); > 65: OutputAnalyzer output = new OutputAnalyzer(pb.start()); Do you think it would be easier to use `ProcessTools.executeTestJava` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23766#discussion_r1969837160 From fyang at openjdk.org Wed Mar 5 21:52:14 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 5 Mar 2025 21:52:14 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms In-Reply-To: <-K3_uSLZJyXC9ofe3YnqgrreDy7f7nmmXvDUT911_F8=.c5e38fc8-49da-476d-821c-7b2c50c0065d@github.com> References: <-K3_uSLZJyXC9ofe3YnqgrreDy7f7nmmXvDUT911_F8=.c5e38fc8-49da-476d-821c-7b2c50c0065d@github.com> Message-ID: On Tue, 25 Feb 2025 14:03:22 GMT, Martin Doerr wrote: > @RealFYang, @offamitkumar: You may want to test and review this PR on your platforms. Hi, Thanks for the ping! RISC-V part of the change looks fine. And `runtime/interpreter/CountBytecodesTest.java` test good with fastdebug build on my platform. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23766#issuecomment-2683762593 From rrich at openjdk.org Wed Mar 5 21:52:31 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 5 Mar 2025 21:52:31 GMT Subject: RFR: 8350266: [PPC64] Interpreter: intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 13:35:51 GMT, David Linus Briemann wrote: > Implementation of intrinsic Thread.currentThread() for PPC64. Looks good. Cheers, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23677#pullrequestreview-2640147563 From duke at openjdk.org Wed Mar 5 21:52:15 2025 From: duke at openjdk.org (David Linus Briemann) Date: Wed, 5 Mar 2025 21:52:15 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 13:57:53 GMT, Mikhail Yankelevich wrote: >> 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms > > test/hotspot/jtreg/runtime/interpreter/CountBytecodesTest.java line 65: > >> 63: } else { >> 64: ProcessBuilder pb = ProcessTools.createTestJavaProcessBuilder("-Xint", "-XX:+CountBytecodes", "CountBytecodesTest", "test"); >> 65: OutputAnalyzer output = new OutputAnalyzer(pb.start()); > > Do you think it would be easier to use `ProcessTools.executeTestJava` here? That looks better. I fixed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23766#discussion_r1969867666 From dean.long at oracle.com Wed Mar 5 22:23:41 2025 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 5 Mar 2025 14:23:41 -0800 Subject: RFD: Grouping hot code in CodeCache In-Reply-To: <1B0C3138-761B-4DB0-8A98-977C6FC40178@amazon.co.uk> References: <1B0C3138-761B-4DB0-8A98-977C6FC40178@amazon.co.uk> Message-ID: Just to clarify, if grouping helps, does that mean the reason for the performance impact of sparse code is mainly due to far calls vs near calls? dl On 3/5/25 10:41 AM, Astigeevich, Evgeny wrote: > > Hi Vladimir, > > This is JDK-8326205: Implement grouping hot nmethods in CodeCache. > > As I managed to synthesize a benchmark > (https://github.com/openjdk/jdk/pull/23831) to demonstrate performance > impact of sparse code, I?d like to discuss a possible solution of the > sparse code. > > High level, a solution is: > > * Detect hot code. > * Group hot code. > * Maintain grouped code. > > Downstream we tried two approaches: > > * *Static lists of methods (compile command):* Identify frequently > used (hot) methods using test runs and provide static method lists > to JVM in production. When JVM compiles a Java method and the > method is on the list, JVM puts the code into to a designated code > heap (HotCodeHeap). > * *Dynamic lists of methods (compiler directives):* Profile an > application in production and dynamically relocate identified hot > methods to HotCodeHeap. Relocation was implemented with recompilation. > > The main advantage of static lists is zero profiling overhead in > production. We do all profiling and analysis in test runs. Its > problems are: > > * *Training Run Accuracy*: We need training runs to have execution > paths closely mimicking production environments. Otherwise we put > wrong methods into HotCodeHeap. > * *Method List Maintenance:* We need to rerun training to regenerate > lists when application code changes. Training runs are expensive > and time-consuming. They require long runs to guarantee we see all > major execution paths. Updating lists in production can be as > complex as application deployment > * *Method Placement Limitations:* Methods marked for HotCodeHeap are > permanently placed into HotCodeHeap. No mechanism to remove > methods that become less frequently used. > > We addressed these problems with dynamic lists of methods. We > implemented a Java agent that runs within the same JVM to dynamically > detect and manage hot Java methods without prior method > identification. The agent detects hot methods using JFR. The agent > manages hot Java methods in HotCodeHeap with compiler directives. A > new compiler directive marks methods with dynamic states ("hot" or > "cold"). Methods marked by the ?hot? state are recompiled and placed > in HotCodeHeap. Methods marked by the ?cold? state are eventually > removed from HotCodeHeap. > > Problems of this approach are: > > * It requires specific, complex modifications to compiler directive > support: recompilation of Java methods affected by compiler > directives changes. This functionality is unique to Java agent > implementation and has limited potential for broader use. > * The agent cannot guarantee Java methods are moved to/removed from > the HotCodeHeap because updates of compiler directives can fail. > * The agent knows nothing about compiled code, e.g. whether it?s C1 > or C2 compiled, code size, profile. This data can useful for > deciding to move or not to move to HotCodeHeap. > * Recompilations, especially C2, are expensive. Having many of them > can cause performance issues. Also recompiled code might differ > from the code we have detected as ?hot?. > > Running these two approaches in production we learned: > > * We detect 95% of actively used methods withing the first 30 > minutes of an application run. This is with JFR profiling > configured: 90 seconds session duration, sampling each 11 ms, 8 > minutes between profiling sessions. We can find actively used > methods faster if we reduce a pause between profiling sessions and > sampling period. However it will increase the profiling overhead > and affect application performance. With the current > configuration, the profiling overhead is between 1% - 2%. > * A set of actively used methods gets into the steady state (no new > methods added to, no methods removed from) within the first 60 > minutes. > * Static lists, when created from runs close to production, have 80% > - 90% methods always in use. This does not change over time. > * Predicting the size of HotCodeHeap is difficult, especially with > dynamic lists. > > We want to have grouping of hot method functionality as a part Hotspot > JVM. We will group only C2 compiled methods. We can group JVMCI > compiled methods, e.g. Graal, if needed. We need profiling precise > enough to detect major Java methods. Low overhead is more important > than precision. > > We think we can have a solution which does not require a lot of code: > > * Detect hot code: we can an implementation based on the Sweeper: > https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/runtime/sweeper.hpp. > We will use the handshakes mechanism, what the Sweeper used, to > detect nmethods on the top of thread stacks. > * Group hot code: we have a draft PR > https://github.com/openjdk/jdk/pull/23573. It implements > relocation of nmethods within CodeCache. > * Maintain grouped code: we will add an additional code heap where > hot nmethods will be relocated to. > > What do you think about this approach? Are there other possible solutions? > > Thanks, > > Evgeny A. > > > > > Amazon Development Centre (London) Ltd.Registered in England and Wales > with registration number 04543232 with its registered office at 1 > Principal Place, Worship Street, London EC2A 2FA, United Kingdom. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpaprotski at openjdk.org Wed Mar 5 23:03:23 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 5 Mar 2025 23:03:23 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4] In-Reply-To: References: Message-ID: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. > > Before (no AVX512) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s > > After (with AVX2) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s > > > Before (with AVX512): > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 102... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: more comment improvements ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23719/files - new: https://git.openjdk.org/jdk/pull/23719/files/bb450137..9d13cefa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=02-03 Stats: 72 lines in 2 files changed: 45 ins; 5 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/23719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23719/head:pull/23719 PR: https://git.openjdk.org/jdk/pull/23719 From eastig at amazon.co.uk Wed Mar 5 23:12:38 2025 From: eastig at amazon.co.uk (Astigeevich, Evgeny) Date: Wed, 5 Mar 2025 23:12:38 +0000 Subject: RFD: Grouping hot code in CodeCache In-Reply-To: References: <1B0C3138-761B-4DB0-8A98-977C6FC40178@amazon.co.uk> Message-ID: <2AEC4850-DFA7-48D9-9925-D45892E3F731@amazon.co.uk> Hi Dean, I currently don?t know what hardware issues sparse code causes on Intel. I need to check which hardware counters get worse. It might be far vs near. Graviton has a counter which measures issues with code placement and can be used to measure code sparsity. I think this counter is not connected to the issue of far calls vs near call. I can run experiments when we use far calls and near calls for dense code. Thanks, Evgeny From: hotspot-dev on behalf of "dean.long at oracle.com" Date: Wednesday 5 March 2025 at 22:24 To: "hotspot-dev at openjdk.org" Subject: RE: [EXTERNAL] RFD: Grouping hot code in CodeCache CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Just to clarify, if grouping helps, does that mean the reason for the performance impact of sparse code is mainly due to far calls vs near calls? dl On 3/5/25 10:41 AM, Astigeevich, Evgeny wrote: Hi Vladimir, This is JDK-8326205: Implement grouping hot nmethods in CodeCache. As I managed to synthesize a benchmark (https://github.com/openjdk/jdk/pull/23831) to demonstrate performance impact of sparse code, I?d like to discuss a possible solution of the sparse code. High level, a solution is: 1. Detect hot code. 2. Group hot code. 3. Maintain grouped code. Downstream we tried two approaches: 1. Static lists of methods (compile command): Identify frequently used (hot) methods using test runs and provide static method lists to JVM in production. When JVM compiles a Java method and the method is on the list, JVM puts the code into to a designated code heap (HotCodeHeap). 2. Dynamic lists of methods (compiler directives): Profile an application in production and dynamically relocate identified hot methods to HotCodeHeap. Relocation was implemented with recompilation. The main advantage of static lists is zero profiling overhead in production. We do all profiling and analysis in test runs. Its problems are: 1. Training Run Accuracy: We need training runs to have execution paths closely mimicking production environments. Otherwise we put wrong methods into HotCodeHeap. 2. Method List Maintenance: We need to rerun training to regenerate lists when application code changes. Training runs are expensive and time-consuming. They require long runs to guarantee we see all major execution paths. Updating lists in production can be as complex as application deployment 3. Method Placement Limitations: Methods marked for HotCodeHeap are permanently placed into HotCodeHeap. No mechanism to remove methods that become less frequently used. We addressed these problems with dynamic lists of methods. We implemented a Java agent that runs within the same JVM to dynamically detect and manage hot Java methods without prior method identification. The agent detects hot methods using JFR. The agent manages hot Java methods in HotCodeHeap with compiler directives. A new compiler directive marks methods with dynamic states ("hot" or "cold"). Methods marked by the ?hot? state are recompiled and placed in HotCodeHeap. Methods marked by the ?cold? state are eventually removed from HotCodeHeap. Problems of this approach are: 1. It requires specific, complex modifications to compiler directive support: recompilation of Java methods affected by compiler directives changes. This functionality is unique to Java agent implementation and has limited potential for broader use. 2. The agent cannot guarantee Java methods are moved to/removed from the HotCodeHeap because updates of compiler directives can fail. 3. The agent knows nothing about compiled code, e.g. whether it?s C1 or C2 compiled, code size, profile. This data can useful for deciding to move or not to move to HotCodeHeap. 4. Recompilations, especially C2, are expensive. Having many of them can cause performance issues. Also recompiled code might differ from the code we have detected as ?hot?. Running these two approaches in production we learned: 1. We detect 95% of actively used methods withing the first 30 minutes of an application run. This is with JFR profiling configured: 90 seconds session duration, sampling each 11 ms, 8 minutes between profiling sessions. We can find actively used methods faster if we reduce a pause between profiling sessions and sampling period. However it will increase the profiling overhead and affect application performance. With the current configuration, the profiling overhead is between 1% - 2%. 2. A set of actively used methods gets into the steady state (no new methods added to, no methods removed from) within the first 60 minutes. 3. Static lists, when created from runs close to production, have 80% - 90% methods always in use. This does not change over time. 4. Predicting the size of HotCodeHeap is difficult, especially with dynamic lists. We want to have grouping of hot method functionality as a part Hotspot JVM. We will group only C2 compiled methods. We can group JVMCI compiled methods, e.g. Graal, if needed. We need profiling precise enough to detect major Java methods. Low overhead is more important than precision. We think we can have a solution which does not require a lot of code: 5. Detect hot code: we can an implementation based on the Sweeper: https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/runtime/sweeper.hpp. We will use the handshakes mechanism, what the Sweeper used, to detect nmethods on the top of thread stacks. 6. Group hot code: we have a draft PR https://github.com/openjdk/jdk/pull/23573. It implements relocation of nmethods within CodeCache. 7. Maintain grouped code: we will add an additional code heap where hot nmethods will be relocated to. What do you think about this approach? Are there other possible solutions? Thanks, Evgeny A. Amazon Development Centre (London) Ltd.Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlivanov at openjdk.org Wed Mar 5 23:22:51 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 5 Mar 2025 23:22:51 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: <5nkWE-TpdoNk-k_5JE7MopX5_KJf6DjjLWMADxWr29k=.ee34fa19-882c-4731-86f6-bdaed2a6e276@github.com> On Tue, 4 Mar 2025 16:52:16 GMT, Aleksey Shipilev wrote: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) There's a wide variety of options to justify the goal of the JEP. A bare minimum would be to just remove x86-32 build support. And on the other side of the spectrum the current patch would be accompanied by all x86-32-specific code and all the features used exclusively by x86-32 port. During previous round of discussions I expressed my preference as keeping JEP implementation simple and perform all non-trivial cleanups as follow-up RFEs. IMO it enables swift removal (and eliminates the burden to keep x86-32 port alive during ongoing development work) while keeping incremental cleanup activities at comfortable pace. Proposed patch perfectly justifies my preference. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2702299307 From kvn at openjdk.org Wed Mar 5 23:35:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 5 Mar 2025 23:35:54 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: <5ztalawYQsCNUsfzWyR_b5YVFWbDNzoHVUA4ycRjvRs=.42fd2b02-462f-4803-9d3b-2b907121c5be@github.com> On Tue, 4 Mar 2025 16:52:16 GMT, Aleksey Shipilev wrote: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) To clarify. I am completely agree with changes in this PR - I approved it. My concern is the **Title** of this PR and JBS entry. So I want to understand the steps we do with this PR and following changes covered by numbers of subtask pointed by Aleksey. So what, @iwanowww, you say is that this PR is **indeed** implementation of the JEP. And all subtasks listed in Umbrella RFE are following up RFEs after we integrated the JEP. Do I understand that correctly? Why not do what Ioi did for AOT class loading JEP? I mean, to have depending PRs which are combined into one implementation push. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2702316448 From vlivanov at openjdk.org Thu Mar 6 00:18:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 6 Mar 2025 00:18:52 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: <5nkWE-TpdoNk-k_5JE7MopX5_KJf6DjjLWMADxWr29k=.ee34fa19-882c-4731-86f6-bdaed2a6e276@github.com> References: <5nkWE-TpdoNk-k_5JE7MopX5_KJf6DjjLWMADxWr29k=.ee34fa19-882c-4731-86f6-bdaed2a6e276@github.com> Message-ID: On Wed, 5 Mar 2025 23:19:50 GMT, Vladimir Ivanov wrote: >> This PR implements JEP 503: Remove the 32-bit x86 Port. >> >> The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. >> >> This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. >> >> The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. >> >> Additional testing: >> - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) >> - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) > > There's a wide variety of options to justify the goal of the JEP. A bare minimum would be to just remove x86-32 build support. And on the other side of the spectrum the current patch would be accompanied by all x86-32-specific code and all the features used exclusively by x86-32 port. > > During previous round of discussions I expressed my preference as keeping JEP implementation simple and perform all non-trivial cleanups as follow-up RFEs. IMO it enables swift removal (and eliminates the burden to keep x86-32 port alive during ongoing development work) while keeping incremental cleanup activities at comfortable pace. > > Proposed patch perfectly justifies my preference. > So what, @iwanowww, you say is that this PR is indeed implementation of the JEP. > And all subtasks listed in Umbrella RFE are following up RFEs after we integrated the JEP. > Do I understand that correctly? Yes. > Why not do what Ioi did for AOT class loading JEP? I mean, to have depending PRs which are combined into one implementation push. It's definitely an option. But, most likely, there'll be some overlooked cases anyway (leading to additional followup RFEs). And the more convoluted the changes are the harder it is to validate their correctness, thus increasing the risks for product stability and delaying the integration. (I'm not sure how much time Aleksey and other contributors want to volunteer to this project.) Also, in case of AOT JEP the situation was quite the opposite: it started with a huge patch which was split into multiple mostly independent parts to streamline its review. For x86-32 code removal there's no such patch prepared yet and the complete scope of work is not clear yet. IMO the crucial part is to get the port officially retired. After that the rest can become a good source of starter tasks :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2702376289 From kvn at openjdk.org Thu Mar 6 00:21:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Mar 2025 00:21:53 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:52:16 GMT, Aleksey Shipilev wrote: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) Okay. Thank you for explaining. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2702380269 From sviswanathan at openjdk.org Thu Mar 6 01:02:17 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 6 Mar 2025 01:02:17 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 23:03:23 GMT, Volodymyr Paprotski wrote: >> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) >> >> Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. >> >> Before (no AVX512) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s >> >> After (with AVX2) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s >> >> >> Before (with AVX512): >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 o... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > more comment improvements Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23719#pullrequestreview-2662829592 From dholmes at openjdk.org Thu Mar 6 01:57:52 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Mar 2025 01:57:52 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v3] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 12:26:40 GMT, Aleksey Shipilev wrote: > On the other hand, the way the event is currently implemented, it only fires when wait-set is not empty, So it it only wants to record notifications that actually did something, then shouldn't we be checking tally before posting the event? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23901#issuecomment-2702525194 From iklam at openjdk.org Thu Mar 6 04:14:32 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 6 Mar 2025 04:14:32 GMT Subject: RFR: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 Message-ID: Since [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426) (Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file), the AOT cache no longer contains classes intended for custom class loaders (these are called "unregistered classes" in CDS terminology). The fix is simple -- we already remember the set of unregistered classes in the AOT configuration file. We just need to add them into the final AOT cache (see changes in finalImageRecipes.cpp). ------------- Commit messages: - 8351319: AOT cache support for custom class loaders broken since JDK-8348426 Changes: https://git.openjdk.org/jdk/pull/23926/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23926&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351319 Stats: 122 lines in 10 files changed: 98 ins; 1 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/23926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23926/head:pull/23926 PR: https://git.openjdk.org/jdk/pull/23926 From dholmes at openjdk.org Thu Mar 6 04:26:57 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Mar 2025 04:26:57 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v3] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 12:31:23 GMT, Fredrik Bredberg wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated comments after review by Patricio. LGTM! Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23421#pullrequestreview-2663279336 From dholmes at openjdk.org Thu Mar 6 04:38:52 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Mar 2025 04:38:52 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:52:16 GMT, Aleksey Shipilev wrote: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) I am also a bit puzzled by the JEP/JBS strategy here. I would expect a bunch of dependent PRs that then get integrated together as "The Implementation of JEP 503". I understand things may be missed that require some follow up RFE's but I don't think we should start from that position and have a large chunk of work not be done under the JEP umbrella. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2702781694 From fyang at openjdk.org Thu Mar 6 05:52:54 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 6 Mar 2025 05:52:54 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v2] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Wed, 5 Mar 2025 09:57:45 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. >> >> ## Performance >> >> data >> >> Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 >> Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 >> Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 >> Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 >> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 >> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 >> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 >> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.81... > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - clean > - merge master > - merge master > - clean 2 > - clean > - initial commit src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6624: > 6622: > 6623: if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_float16ToFloat) && > 6624: vmIntrinsics::is_intrinsic_available(vmIntrinsics::_floatToFloat16)) { Since this stub uses instructions from the Zfh extension which is not always available, do we need a similar checking like x86 [1] (https://bugs.openjdk.org/browse/JDK-8303415)? I see `vmIntrinsics::is_intrinsic_available` delegates work to `VM_Version::is_intrinsic_supported` [2]. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L3333 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/classfile/vmIntrinsics.cpp#L671 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1982726601 From dholmes at openjdk.org Thu Mar 6 06:29:57 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Mar 2025 06:29:57 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v2] In-Reply-To: <22O5bysM7g9bWIQOpHWaXQSf3feld_GlkvMTPrCFlUA=.4dc2cd48-b4fc-4611-805e-a16f2ece812d@github.com> References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> <22O5bysM7g9bWIQOpHWaXQSf3feld_GlkvMTPrCFlUA=.4dc2cd48-b4fc-4611-805e-a16f2ece812d@github.com> Message-ID: On Wed, 5 Mar 2025 12:03:52 GMT, Aleksey Shipilev wrote: >> We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. >> >> This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8351142-jfr-deflate-event > - Test updates > - Rework statistics event to be actually statistics > - Filter JFR HiddenWait consistently > - Event metadata touchups > - Separate statistics event as well > - Fix I'm not completely clear on how the events operate, but the general runtime changes look okay to me. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23900#pullrequestreview-2663450619 From stefank at openjdk.org Thu Mar 6 07:46:03 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 6 Mar 2025 07:46:03 GMT Subject: RFR: 8323158: HotSpot Style Guide should specify more include ordering [v2] In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 12:14:35 GMT, Stefan Karlsson wrote: >> The HotSpot Style Guide has a section about source files and includes. The style used for includes have mostly been introduced by scripts when includeDB was replaced, but also when various other enhancements to our includes were made. Some of the introduced styles were never written down in the style guide. >> >> I propose a couple of changes to the HotSpot Style Guide to reflect some of these implicit styles that we have. While updating the text I also took the liberty to order the items in an order that I felt was good. >> >> Note that JDK-8323158 contains a few more suggestions, but I've only addressed the items that I think can be accepted without much contention. Either I extract the items that have not been address into a new RFE, or I create a new RFE for this PR. >> >> There a some small whitespace tweaks that I made so that the .md and .html had a similar layout. > > Stefan Karlsson has updated the pull request incrementally with two additional commits since the last revision: > > - Update hotspot-style.md > - Update hotspot-style.html This RFR has now been out for a while, with positive feedback from HotSpot Members, and no negative feedback. I think that this we have reached consensus about this update. Thanks for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23388#issuecomment-2703052781 From stefank at openjdk.org Thu Mar 6 07:46:04 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 6 Mar 2025 07:46:04 GMT Subject: Integrated: 8323158: HotSpot Style Guide should specify more include ordering In-Reply-To: References: Message-ID: <1rd22rjNeT6F6Zpf0xeGelDmfdrhYcsr8wL66L64b7A=.e5475a49-fa98-4549-826b-db998b66e8d3@github.com> On Fri, 31 Jan 2025 13:56:58 GMT, Stefan Karlsson wrote: > The HotSpot Style Guide has a section about source files and includes. The style used for includes have mostly been introduced by scripts when includeDB was replaced, but also when various other enhancements to our includes were made. Some of the introduced styles were never written down in the style guide. > > I propose a couple of changes to the HotSpot Style Guide to reflect some of these implicit styles that we have. While updating the text I also took the liberty to order the items in an order that I felt was good. > > Note that JDK-8323158 contains a few more suggestions, but I've only addressed the items that I think can be accepted without much contention. Either I extract the items that have not been address into a new RFE, or I create a new RFE for this PR. > > There a some small whitespace tweaks that I made so that the .md and .html had a similar layout. This pull request has now been integrated. Changeset: 649ef779 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/649ef77951d420512e385ee3c792ced80276a30a Stats: 46 lines in 2 files changed: 28 ins; 7 del; 11 mod 8323158: HotSpot Style Guide should specify more include ordering Reviewed-by: kbarrett, stuefe, dholmes, kvn ------------- PR: https://git.openjdk.org/jdk/pull/23388 From rcastanedalo at openjdk.org Thu Mar 6 09:05:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 6 Mar 2025 09:05:55 GMT Subject: RFR: 8346194: Improve G1 pre-barrier C2 cost estimate In-Reply-To: References: Message-ID: <3Z73q1oaTx8jgXCbLsNkW65xW5yWNZFPhxEWRSF_FuA=.133a49d0-d74e-4ef1-9188-cdaad0a971ac@github.com> On Mon, 3 Mar 2025 12:30:23 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that modifies pre-barrier node costs for loop-unrolling to only consider the fast path. The reasoning is similar to zgc (and the new costs as well): only the part of the barrier inlined into the main code stream, as the slow path is laid out separately and does/should not directly affect performance (particularly if there is no marking going on). > > There are no differences/impact in performance since the post barrier cost is still very large, which fill be fixed elsewhere. > > Testing: gha, perf testing standalone (neither micros nor actual benchmarks give any difference outside of variance), testing with JDK-8342382 > > Hth, > Thomas src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 298: > 296: // directly affect performance. > 297: // It has a cost of 4 (Cmp, Bool, If, IfProj). > 298: nodes += 4; It probably does not make a big overall difference to the loop unrolling heuristics, but for better accuracy you might want to count in the nodes for loading the "active" byte (one node for computing the address relative to the thread-local storage base and one node for the load itself), i.e. 6 nodes in total rather than 4 (the `ThreadLocal` node representing the thread-local storage base is shared by other barrier operations so I would not count it as part of the pre-barrier fast path): ![pre-barrier-fast-path](https://github.com/user-attachments/assets/7718ec94-1612-44ac-b18c-78d962057ab6) Suggestion: // It has a cost of 6 (AddP, LoadB, Cmp, Bool, If, IfProj). nodes += 6; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23862#discussion_r1982961196 From fbredberg at openjdk.org Thu Mar 6 09:11:02 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 6 Mar 2025 09:11:02 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v3] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 12:31:23 GMT, Fredrik Bredberg wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated comments after review by Patricio. Thanks everyone for the reviews, testing and Graal adaptation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23421#issuecomment-2703235851 From fbredberg at openjdk.org Thu Mar 6 09:11:02 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 6 Mar 2025 09:11:02 GMT Subject: Integrated: 8343840: Rewrite the ObjectMonitor lists In-Reply-To: References: Message-ID: <8EoKGr_0E4MpBGmwoWS8At5wyy2Q44zgxI8eWi4A-AA=.46f9908d-a525-4ca5-9472-34cfd37de0d3@github.com> On Mon, 3 Feb 2025 16:29:25 GMT, Fredrik Bredberg wrote: > I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. > > This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. > > In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. > > The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. > > You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. > > The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. > > Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). > > Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. > > However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fact that c2 no longer has to check b... This pull request has now been integrated. Changeset: 7a5acb9b Author: Fredrik Bredberg URL: https://git.openjdk.org/jdk/commit/7a5acb9be17cd54bbd0abf2524386b981dd5ac04 Stats: 614 lines in 10 files changed: 214 ins; 228 del; 172 mod 8343840: Rewrite the ObjectMonitor lists Reviewed-by: dholmes, coleenp, pchilanomate, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/23421 From jbhateja at openjdk.org Thu Mar 6 09:34:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 6 Mar 2025 09:34:53 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v3] In-Reply-To: References: <0E2AqFpNPjDjP6jqCXn8toePBcW2SIHw1kFXlZX4W_U=.8d692bfa-0598-4969-b480-4a285366e0bb@github.com> Message-ID: On Wed, 5 Mar 2025 18:27:44 GMT, Ferenc Rakoczi wrote: >> Hi @ferakocz , >> >> Thanks!, for efficient utilization of Decode ICache (please refer to Intel SDM section 3.4.2.5), code blocks should be aligned to 32-byte boundaries; a 64-byte aligned code is a superset of both 16 and 32 byte aligned addresses and also matches with the cacheline size. However, I can noticed that we have been using OptoLoopAlignment at places in AES-GCM also. >> >> I introduced some errors in generate_dilithiumAlmostInverseNtt_avx512 implementation in anticipation of catching it through existing ML_DSA_Tests under >> test/jdk/sun/security/provider/acvp >> >> But all the tests passed for me. >> `java -jar /home/jatinbha/sandboxes/jtreg/build/images/jtreg/lib/jtreg.jar -jdk:$JAVA_HOME -Djdk.test.lib.artifacts.ACVP-Server=/home/jatinbha/softwares/v1.1.0.38.zip -va -timeout:4 Launcher.java` >> >> Can you please point out a test I need to use for validation > > I think the easiest is to put a for (int i = 0; i < 1000; i++) loop around the switch statement in the run() method of the ML_DSA_Test class (test/jdk/sun/security/provider/acvp/ML_DSA_Test.java). (This is because the intrinsics kick in after a few thousand calls of the method.) Hi @ferakocz , Yes, we should modify the test or lower the compilation threshold with -Xbatch -XX:TieredCompileThreshold=0.1. Alternatively, since the tests has a depedency on Automatic Cryptographic Validation Test server I have created a simplified test which cover all the security levels. Kindly include [test/hotspot/jtreg/compiler/intrinsics/signature/TestModuleLatticeDSA.java ](https://github.com/ferakocz/jdk/pull/1) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983009390 From jbhateja at openjdk.org Thu Mar 6 09:34:55 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 6 Mar 2025 09:34:55 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v4] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 13:10:34 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Added alignment to loop entries. src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 85: > 83: if (UseSHA3Intrinsics) { > 84: StubRoutines::_sha3_implCompress = generate_sha3_implCompress(StubGenStubId::sha3_implCompress_id); > 85: StubRoutines::_double_keccak = generate_double_keccak(); Should UseDilithiumIntrinsics guard double_keccak generation ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1982922845 From duke at openjdk.org Thu Mar 6 09:49:12 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 6 Mar 2025 09:49:12 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 08:37:57 GMT, Jatin Bhateja wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Added alignment to loop entries. > > src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 85: > >> 83: if (UseSHA3Intrinsics) { >> 84: StubRoutines::_sha3_implCompress = generate_sha3_implCompress(StubGenStubId::sha3_implCompress_id); >> 85: StubRoutines::_double_keccak = generate_double_keccak(); > > Should UseDilithiumIntrinsics guard double_keccak generation ? No, that is more of a SHA3 thing, other algorithms can take advantage of it, too (e.g. ML-KEM). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983033331 From shade at openjdk.org Thu Mar 6 09:52:09 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 6 Mar 2025 09:52:09 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: <5nkWE-TpdoNk-k_5JE7MopX5_KJf6DjjLWMADxWr29k=.ee34fa19-882c-4731-86f6-bdaed2a6e276@github.com> Message-ID: On Thu, 6 Mar 2025 00:16:12 GMT, Vladimir Ivanov wrote: >> There's a wide variety of options to justify the goal of the JEP. A bare minimum would be to just remove x86-32 build support. And on the other side of the spectrum the current patch would be accompanied by all x86-32-specific code and all the features used exclusively by x86-32 port. >> >> During previous round of discussions I expressed my preference as keeping JEP implementation simple and perform all non-trivial cleanups as follow-up RFEs. IMO it enables swift removal (and eliminates the burden to keep x86-32 port alive during ongoing development work) while keeping incremental cleanup activities at comfortable pace. >> >> Proposed patch perfectly justifies my preference. > >> So what, @iwanowww, you say is that this PR is indeed implementation of the JEP. >> And all subtasks listed in Umbrella RFE are following up RFEs after we integrated the JEP. >> Do I understand that correctly? > > Yes. > >> Why not do what Ioi did for AOT class loading JEP? I mean, to have depending PRs which are combined into one implementation push. > > It's definitely an option. But, most likely, there'll be some overlooked cases anyway (leading to additional followup RFEs). And the more convoluted the changes are the harder it is to validate their correctness, thus increasing the risks for product stability and delaying the integration. (I'm not sure how much time Aleksey and other contributors want to volunteer to this project.) > > Also, in case of AOT JEP the situation was quite the opposite: it started with a huge patch which was split into multiple mostly independent parts to streamline its review. For x86-32 code removal there's no such patch prepared yet and the complete scope of work is not clear yet. > > IMO the crucial part is to get the port officially retired. After that the rest can become a good source of starter tasks :-) Basically what @iwanowww said: this PR *is* the removal of x86_32 port. After this PR integrates, it is not possible to build x86_32, because the core implementation of it is missing, and build system would refuse to even try building it. So this removes x86_32 port as the feature, atomically, matching the title and intent of the JEP. *Then*, follow-up subtasks RFE would clean up the parts of Hotspot that were added to support various x86_32-specific features, and are no longer needed anymore. I, for one, also believed the complete PR would be more straight-forward. I attempted this at at https://github.com/openjdk/jdk/pull/22567. After working on that draft PR, and listening to what people said about it, I can conclude that is not a great way to go with this removal. The massive drawbacks of complete/stacked PR are now obvious to me: 1. It is hard to review. The complete PR is huge, 210+ files affected. A lot of removals are logically connected across different files, and while they are simple in isolation, it is hard for a reviewer to separate several cleanups in large PRs. 2. It accrues merge conflicts very fast. This happens even when mainline is somewhat idle without large feature integrations. I expect this work to be even harder once we are closer to RDP1. 3. It is hard to reach consensus on. Non-trivial changes require thorough review, and cobbling together multiple non-trivial changes require polynomially more coordination. I have seen this in Win32 port removal, so for a large PR like that I expect multiple, week-long review and amendment sessions. Which conspires with (1) and (2). 4. It is easy to introduce/overlook bugs. I already did this once in a complete PR when I accidentally removed the wrong part of C1 regalloc code, and it started ever so slightly misbehaving. And it was not obvious, because it was obscured by other changes in the vicinity. Which conspires with (1), (2) and (3). 5. It would introduce a single changeset that would be hard to bisect when things go wrong. And the things would go wrong, because of (1), (4) and partially by new opportunities presented by (2). For the C1 bug I mentioned above, I was able to quickly nail it through the bisection of my stack of atomic commits. That stack would not be available once we squash the commits/PRs before the integration. So while on a surface it might look more enticing to purge everything at once, the amount of hassle we would endure is hard to justify. Doing this PR for port removal + multiple post-removal cleanups piecewise lets us reach the same final state without extra work, while doing so at leisurely pace and maintaining more convenient code history for future bug hunts. Bottom-line: Let's not make our own lives harder unnecessarily. Atomic commits FTW. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2703337731 From jsjolen at openjdk.org Thu Mar 6 10:27:06 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Mar 2025 10:27:06 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag In-Reply-To: References: Message-ID: <0CDxD_JcYtu4Ax1xB8TDyWqLkxNub6OfJRtSmCFONgU=.bd3edae0-3eaf-4ba3-ac9e-2582d1baf151@github.com> On Wed, 5 Mar 2025 15:28:59 GMT, Gerard Ziemski wrote: >> With the `size` parameter there will be no need to traverse/go through the nodes between the base and end of the region. >> Tests: >> linux-x64-debug, gtest:NMT* and runtime/NMT* > > src/hotspot/share/cds/metaspaceShared.cpp line 1475: > >> 1473: (address)archive_space_rs.base() == base_address, "Sanity"); >> 1474: // Register archive space with NMT. >> 1475: MemTracker::record_virtual_memory_tag(archive_space_rs.base(), archive_space_rs.size(), mtClassShared); > > The pattern here is: > > `something.base(), something.base.size()` > > instead of doing this over and over again, why can't we just pass `something` to MemTracker::record_virtual_memory_tag() and let it figure out `base` and `size` itself? We could have an overload for `ReservedSpace`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23770#discussion_r1983093725 From tschatzl at openjdk.org Thu Mar 6 10:35:07 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 6 Mar 2025 10:35:07 GMT Subject: RFR: 8346194: Improve G1 pre-barrier C2 cost estimate [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that modifies pre-barrier node costs for loop-unrolling to only consider the fast path. The reasoning is similar to zgc (and the new costs as well): only the part of the barrier inlined into the main code stream, as the slow path is laid out separately and does/should not directly affect performance (particularly if there is no marking going on). > > There are no differences/impact in performance since the post barrier cost is still very large, which fill be fixed elsewhere. > > Testing: gha, perf testing standalone (neither micros nor actual benchmarks give any difference outside of variance), testing with JDK-8342382 > > Hth, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23862/files - new: https://git.openjdk.org/jdk/pull/23862/files/429bc01b..4f8273a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23862&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23862&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23862/head:pull/23862 PR: https://git.openjdk.org/jdk/pull/23862 From sgehwolf at openjdk.org Thu Mar 6 10:45:11 2025 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 6 Mar 2025 10:45:11 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 17:45:26 GMT, Thomas Fitzsimmons wrote: >> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. >> >> I tested it with: >> >> >> java -Xlog:os+container=trace -version >> >> on: >> >> `Red Hat Enterprise Linux 8 (cgroups v1 only)`: >> _No change in behaviour_ >> >> `Fedora 41 (cgroups v2)`: >> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ >> >> --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 >> +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 >> @@ -1,7 +1,12 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> +[debug][os,container] v2 controller cpuset is enabled and relevant >> +[debug][os,container] v2 controller cpu is enabled and required >> +[debug][os,container] v2 controller io is enabled but not relevant >> +[debug][os,container] v2 controller memory is enabled and required >> +[debug][os,container] v2 controller hugetlb is enabled but not relevant >> +[debug][os,container] v2 controller pids is enabled and relevant >> +[debug][os,container] v2 controller rdma is enabled but not relevant >> +[debug][os,container] v2 controller misc is enabled but not relevant >> [debug][os,container] Detected cgroups v2 unified hierarchy >> [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope >> [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max >> >> >> `Fedora 41 (custom kernel with cgroups v1 disabled)`: >> _Fixes `cgroups v2` detection:_ >> >> --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 >> +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 >> @@ -1,7 +1,63 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> -[debug][os,container] controller memory is not enabled >> - ] >> -[debug][os,container] One or more required controllers disabled at kernel level. >> +[... > > Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 > - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent > - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing > > Remove from cgroups v1 branch incorrect log messages about cpuset > controller being optional. Add test case for cgroups v1, cpuset > disabled. > - Improve !cgroups_v2_enabled branch comment > - Debug-log optional and disabled cgroups v2 controllers > > Do not log enabled controllers that are not relevant to the JDK. > - Move index declaration to scope in which it is used > - Remove empty string check during cgroup.controllers parsing > - Define ISSPACE_CHARS macro, use it in strsep call > - Pass fgets result to strsep > - Replace is_cgroupsV2 with cgroups_v2_enabled > > Also fix the testCgroupv1SystemdOnly and testCgroupv1NoMounts test > cases such that their /proc/cgroups and /proc/self/cgroup contents > correspond. This prevents assertion failures these tests were > producing when is_cgroupsV2 was replaced with cgroups_v2_enabled. > - ... and 3 more: https://git.openjdk.org/jdk/compare/328c0778...b6926e15 This looks good to me. We might need a similar change on the `Metrics` side to get the version detection in sync. @tstuefe @ashu-mehra Could you please help with a second review? ------------- Marked as reviewed by sgehwolf (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23811#pullrequestreview-2664045780 PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2703471996 From shade at openjdk.org Thu Mar 6 12:24:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 6 Mar 2025 12:24:10 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v3] In-Reply-To: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: > We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. > > This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Touch up descriptions - Fix test in release builds - Merge branch 'master' into JDK-8351142-jfr-deflate-event - Merge branch 'master' into JDK-8351142-jfr-deflate-event - Test updates - Rework statistics event to be actually statistics - Filter JFR HiddenWait consistently - Event metadata touchups - Separate statistics event as well - Fix ------------- Changes: https://git.openjdk.org/jdk/pull/23900/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23900&range=02 Stats: 295 lines in 13 files changed: 284 ins; 6 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23900/head:pull/23900 PR: https://git.openjdk.org/jdk/pull/23900 From coleenp at openjdk.org Thu Mar 6 12:38:52 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Mar 2025 12:38:52 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:52:16 GMT, Aleksey Shipilev wrote: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) I agree with @iwanowww's and @shipilev comments. I would like to see this be the JEP implementation and the additional cleanups, particularly in the interpreter, handled one by one. I don't see any advantage for one big integration push. It'll be disruptive and for this, there is no scenario where this would be helpful to any future work. When Aleksey sent out the original PR there were cleanups that needed explanation. Finding the explanations in the big PR is a pain for scrolling. And the reviewers for that part of the change were a different set than ones needed for this change. Again for no benefit. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23906#pullrequestreview-2664309410 From coleenp at openjdk.org Thu Mar 6 12:38:53 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Mar 2025 12:38:53 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: <5nkWE-TpdoNk-k_5JE7MopX5_KJf6DjjLWMADxWr29k=.ee34fa19-882c-4731-86f6-bdaed2a6e276@github.com> Message-ID: On Thu, 6 Mar 2025 09:48:47 GMT, Aleksey Shipilev wrote: >>> So what, @iwanowww, you say is that this PR is indeed implementation of the JEP. >>> And all subtasks listed in Umbrella RFE are following up RFEs after we integrated the JEP. >>> Do I understand that correctly? >> >> Yes. >> >>> Why not do what Ioi did for AOT class loading JEP? I mean, to have depending PRs which are combined into one implementation push. >> >> It's definitely an option. But, most likely, there'll be some overlooked cases anyway (leading to additional followup RFEs). And the more convoluted the changes are the harder it is to validate their correctness, thus increasing the risks for product stability and delaying the integration. (I'm not sure how much time Aleksey and other contributors want to volunteer to this project.) >> >> Also, in case of AOT JEP the situation was quite the opposite: it started with a huge patch which was split into multiple mostly independent parts to streamline its review. For x86-32 code removal there's no such patch prepared yet and the complete scope of work is not clear yet. >> >> IMO the crucial part is to get the port officially retired. After that the rest can become a good source of starter tasks :-) > > Basically what @iwanowww said: this PR *is* the removal of x86_32 port. > > After this PR integrates, it is not possible to build x86_32, because the core implementation of it is missing, and build system would refuse to even try building it. So this removes x86_32 port as the feature, atomically, matching the title and intent of the JEP. *Then*, follow-up subtasks RFE would clean up the parts of Hotspot that were added to support various x86_32-specific features, and are no longer needed anymore. > > Honestly, I also believed the complete PR that cleans up every dusty corner at once would be more straight-forward. But then I tried it at https://github.com/openjdk/jdk/pull/22567. After investing a few full days on that draft PR, and listening to what people said about it, I firmly changed my mind, and can conclude that singular PR or series of stacked PRs are not a great way to go with this removal. > > The massive drawbacks of complete/stacked PR are now obvious to me: > 1. It is hard to review. The complete PR is huge, 210+ files affected. A lot of removals are logically connected across different files, and while they are simple in isolation, it is hard for a reviewer to separate several cleanups in large PRs. Stacked PRs would help some, but: > 2. It accrues merge conflicts very fast. This happens even when mainline is somewhat idle without large feature integrations. I did complete PR near New Year holidays, and it was _already_ a headache. I expect this work to be even harder once we are closer to RDP1. It would be even more tedious with a chain of 10+ stacked PRs, as I got the preview of this when rebasing the stack of atomic commits in the complete draft PR several times. > 3. It is hard to reach consensus on. Non-trivial changes require thorough review, and cobbling together multiple non-trivial changes require polynomially more coordination. I have seen this in Win32 port removal, so for a large PR like that I expect multiple, week-long review and amendment sessions. Which conspires with (1) and (2). > 4. It is easy to introduce/overlook bugs. I already did this once in a complete PR when I accidentally removed the wrong part of C1 regalloc code, and it started ever so slightly misbehaving. And it was not obvious, because it was obscured by other changes in the vicinity, and it only failed one test in tier4. This conspires with (1), (2) and (3). > 5. It would introduce a single changeset that would be hard to bisect when things go wrong. And the things wo... Also @shipilev I'm jealous of all your code removal. :) Well done getting agreement on this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2703725960 From shade at openjdk.org Thu Mar 6 13:17:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 6 Mar 2025 13:17:42 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 01:55:44 GMT, David Holmes wrote: > So it it only wants to record notifications that actually did something, then shouldn't we be checking tally before posting the event? Yes, I don't see why not, updated. In practice, we rarely lose the notification races on internal locks, so tally is zero very rarely. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23901#issuecomment-2703817986 From shade at openjdk.org Thu Mar 6 13:17:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 6 Mar 2025 13:17:42 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v4] In-Reply-To: References: Message-ID: > We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. > > Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Only emit event when notification happened - Merge branch 'master' into JDK-8351187-jfr-monitor-notify - Rewrite test to RecordingStream - Drop threshold to 0ms - Merge branch 'master' into JDK-8351187-jfr-monitor-notify - Disable by default - Fix ------------- Changes: https://git.openjdk.org/jdk/pull/23901/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23901&range=03 Stats: 168 lines in 7 files changed: 162 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23901/head:pull/23901 PR: https://git.openjdk.org/jdk/pull/23901 From shade at openjdk.org Thu Mar 6 13:21:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 6 Mar 2025 13:21:54 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v2] In-Reply-To: References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> <22O5bysM7g9bWIQOpHWaXQSf3feld_GlkvMTPrCFlUA=.4dc2cd48-b4fc-4611-805e-a16f2ece812d@github.com> Message-ID: On Wed, 5 Mar 2025 20:04:13 GMT, Erik Gahlin wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8351142-jfr-deflate-event >> - Test updates >> - Rework statistics event to be actually statistics >> - Filter JFR HiddenWait consistently >> - Event metadata touchups >> - Separate statistics event as well >> - Fix > > Looks good. Fixed the merge conflicts, and touched up event descriptions a bit. @egahlin, see if those still make sense to you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23900#issuecomment-2703834508 From duke at openjdk.org Thu Mar 6 14:05:07 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Thu, 6 Mar 2025 14:05:07 GMT Subject: Withdrawn: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 21:03:58 GMT, Thomas Fitzsimmons wrote: > This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. > > I tested it with: > > > java -Xlog:os+container=trace -version > > on: > > `Red Hat Enterprise Linux 8 (cgroups v1 only)`: > _No change in behaviour_ > > `Fedora 41 (cgroups v2)`: > _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ > > --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 > +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 > @@ -1,7 +1,12 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 controller cpu is enabled and required > +[debug][os,container] v2 controller io is enabled but not relevant > +[debug][os,container] v2 controller memory is enabled and required > +[debug][os,container] v2 controller hugetlb is enabled but not relevant > +[debug][os,container] v2 controller pids is enabled and relevant > +[debug][os,container] v2 controller rdma is enabled but not relevant > +[debug][os,container] v2 controller misc is enabled but not relevant > [debug][os,container] Detected cgroups v2 unified hierarchy > [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope > [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max > > > `Fedora 41 (custom kernel with cgroups v1 disabled)`: > _Fixes `cgroups v2` detection:_ > > --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 > +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 > @@ -1,7 +1,63 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > -[debug][os,container] controller memory is not enabled > - ] > -[debug][os,container] One or more required controllers disabled at kernel level. > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 contro... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23811 From galder at openjdk.org Thu Mar 6 14:06:35 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 6 Mar 2025 14:06:35 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v13] In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Add simple reduction benchmarks on top of multiply ones ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20098/files - new: https://git.openjdk.org/jdk/pull/20098/files/a190ae68..d0e793a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=11-12 Stats: 44 lines in 1 file changed: 40 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20098.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20098/head:pull/20098 PR: https://git.openjdk.org/jdk/pull/20098 From duke at openjdk.org Thu Mar 6 14:12:01 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Thu, 6 Mar 2025 14:12:01 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 17:45:26 GMT, Thomas Fitzsimmons wrote: >> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. >> >> I tested it with: >> >> >> java -Xlog:os+container=trace -version >> >> on: >> >> `Red Hat Enterprise Linux 8 (cgroups v1 only)`: >> _No change in behaviour_ >> >> `Fedora 41 (cgroups v2)`: >> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ >> >> --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 >> +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 >> @@ -1,7 +1,12 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> +[debug][os,container] v2 controller cpuset is enabled and relevant >> +[debug][os,container] v2 controller cpu is enabled and required >> +[debug][os,container] v2 controller io is enabled but not relevant >> +[debug][os,container] v2 controller memory is enabled and required >> +[debug][os,container] v2 controller hugetlb is enabled but not relevant >> +[debug][os,container] v2 controller pids is enabled and relevant >> +[debug][os,container] v2 controller rdma is enabled but not relevant >> +[debug][os,container] v2 controller misc is enabled but not relevant >> [debug][os,container] Detected cgroups v2 unified hierarchy >> [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope >> [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max >> >> >> `Fedora 41 (custom kernel with cgroups v1 disabled)`: >> _Fixes `cgroups v2` detection:_ >> >> --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 >> +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 >> @@ -1,7 +1,63 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> -[debug][os,container] controller memory is not enabled >> - ] >> -[debug][os,container] One or more required controllers disabled at kernel level. >> +[... > > Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 > - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent > - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing > > Remove from cgroups v1 branch incorrect log messages about cpuset > controller being optional. Add test case for cgroups v1, cpuset > disabled. > - Improve !cgroups_v2_enabled branch comment > - Debug-log optional and disabled cgroups v2 controllers > > Do not log enabled controllers that are not relevant to the JDK. > - Move index declaration to scope in which it is used > - Remove empty string check during cgroup.controllers parsing > - Define ISSPACE_CHARS macro, use it in strsep call > - Pass fgets result to strsep > - Replace is_cgroupsV2 with cgroups_v2_enabled > > Also fix the testCgroupv1SystemdOnly and testCgroupv1NoMounts test > cases such that their /proc/cgroups and /proc/self/cgroup contents > correspond. This prevents assertion failures these tests were > producing when is_cgroupsV2 was replaced with cgroups_v2_enabled. > - ... and 3 more: https://git.openjdk.org/jdk/compare/e34eb0c6...b6926e15 I closed this accidentally, sorry; (I do not know how though; maybe I accidentally pressed `Enter` with the `Close` button focused?). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2703957126 From azafari at openjdk.org Thu Mar 6 14:22:38 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 6 Mar 2025 14:22:38 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag [v2] In-Reply-To: References: Message-ID: > With the `size` parameter there will be no need to traverse/go through the nodes between the base and end of the region. > Tests: > linux-x64-debug, gtest:NMT* and runtime/NMT* Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: ReservedSpace is accepted as param. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23770/files - new: https://git.openjdk.org/jdk/pull/23770/files/0a1495bc..1e7853e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23770&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23770&range=00-01 Stats: 21 lines in 12 files changed: 4 ins; 1 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/23770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23770/head:pull/23770 PR: https://git.openjdk.org/jdk/pull/23770 From azafari at openjdk.org Thu Mar 6 14:22:39 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 6 Mar 2025 14:22:39 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag [v2] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 15:25:29 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> ReservedSpace is accepted as param. > > src/hotspot/share/cds/metaspaceShared.cpp line 1548: > >> 1546: return nullptr; >> 1547: } >> 1548: // NMT: fix up the space tags > > What exactly needs to be fixed here? Removed. Obsolete comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23770#discussion_r1983442554 From azafari at openjdk.org Thu Mar 6 14:22:39 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 6 Mar 2025 14:22:39 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag [v2] In-Reply-To: <0CDxD_JcYtu4Ax1xB8TDyWqLkxNub6OfJRtSmCFONgU=.bd3edae0-3eaf-4ba3-ac9e-2582d1baf151@github.com> References: <0CDxD_JcYtu4Ax1xB8TDyWqLkxNub6OfJRtSmCFONgU=.bd3edae0-3eaf-4ba3-ac9e-2582d1baf151@github.com> Message-ID: On Thu, 6 Mar 2025 10:23:54 GMT, Johan Sj?len wrote: >> src/hotspot/share/cds/metaspaceShared.cpp line 1475: >> >>> 1473: (address)archive_space_rs.base() == base_address, "Sanity"); >>> 1474: // Register archive space with NMT. >>> 1475: MemTracker::record_virtual_memory_tag(archive_space_rs.base(), archive_space_rs.size(), mtClassShared); >> >> The pattern here is: >> >> `something.base(), something.base.size()` >> >> instead of doing this over and over again, why can't we just pass `something` to MemTracker::record_virtual_memory_tag() and let it figure out `base` and `size` itself? > > We could have an overload for `ReservedSpace`. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23770#discussion_r1983441505 From mli at openjdk.org Thu Mar 6 14:26:43 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 6 Mar 2025 14:26:43 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v3] In-Reply-To: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: > Hi, > Can you help to review this patch? > It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. > > ## Performance > > data > > Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 > Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 > Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 > Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 > Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 > Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 > Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 > Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.818 | 250.761 | 5191.12 | 64.598 | ns/op | 17.639 > Fl... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix is_intrinsic_available of _float16ToFloat ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23844/files - new: https://git.openjdk.org/jdk/pull/23844/files/54bf239a..83b3bc76 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23844&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23844&range=01-02 Stats: 28 lines in 4 files changed: 28 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23844/head:pull/23844 PR: https://git.openjdk.org/jdk/pull/23844 From azafari at openjdk.org Thu Mar 6 14:26:56 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 6 Mar 2025 14:26:56 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v35] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: review comments applied ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/fc106d5f..f2f1a800 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=33-34 Stats: 6 lines in 4 files changed: 1 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From azafari at openjdk.org Thu Mar 6 14:26:59 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 6 Mar 2025 14:26:59 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v34] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Wed, 5 Mar 2025 16:06:55 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> test cases for doing reserve or commit the same region twice. > > src/hotspot/share/nmt/memReporter.cpp line 451: > >> 449: }); >> 450: >> 451: if (reserved_and_committed) > > Missing braces Done. > src/hotspot/share/nmt/regionsTree.hpp line 37: > >> 35: // for processing the tree nodes in a shorter and more meaningful way. >> 36: class RegionsTree : public VMATree { >> 37: private: > > Remote private, not needed. Done. > src/hotspot/share/nmt/regionsTree.hpp line 56: > >> 54: NodeHelper() : _node(nullptr) { } >> 55: NodeHelper(Node* node) : _node(node) { } >> 56: inline bool is_valid() { return _node != nullptr; } > > Missing `const` Done. > src/hotspot/share/nmt/regionsTree.inline.hpp line 33: > >> 31: void RegionsTree::visit_committed_regions(const ReservedMemoryRegion& rgn, F func) { >> 32: position start = (position)rgn.base(); >> 33: size_t end = (size_t)rgn.end() + 1; > > Can we `static_cast(rgn.end())` instead? Should be `reinterpret_cast<>` instead. Done. > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 60: > >> 58: if (tracker == nullptr) return false; >> 59: _tracker = new (tracker) VirtualMemoryTracker(level == NMT_detail); >> 60: return _tracker->tree() != nullptr; > > @afshin-zafari , `_tracker->tree()` can never be null anymore. In the future we should do a PR where we change it to return a reference. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1983447519 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1983447822 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1983448306 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1983449521 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1983450068 From kbarrett at openjdk.org Thu Mar 6 14:27:59 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 6 Mar 2025 14:27:59 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v9] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 21:26:08 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > removed template paramter and moved ptr can_align_up Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23711#pullrequestreview-2664612323 From mli at openjdk.org Thu Mar 6 14:30:40 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 6 Mar 2025 14:30:40 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v4] In-Reply-To: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: > Hi, > Can you help to review this patch? > It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. > > ## Performance > > data > > Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 > Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 > Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 > Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 > Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 > Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 > Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 > Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.818 | 250.761 | 5191.12 | 64.598 | ns/op | 17.639 > Fl... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: remove TestFloat16VectorConvChain test for riscv temporariely ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23844/files - new: https://git.openjdk.org/jdk/pull/23844/files/83b3bc76..a6d36051 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23844&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23844&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23844/head:pull/23844 PR: https://git.openjdk.org/jdk/pull/23844 From mli at openjdk.org Thu Mar 6 14:30:41 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 6 Mar 2025 14:30:41 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v2] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Thu, 6 Mar 2025 05:44:36 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - clean >> - merge master >> - merge master >> - clean 2 >> - clean >> - initial commit > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6624: > >> 6622: >> 6623: if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_float16ToFloat) && >> 6624: vmIntrinsics::is_intrinsic_available(vmIntrinsics::_floatToFloat16)) { > > Since this stub uses instructions from the Zfh extension which is not always available, do we need a similar checking like x86 [1] (https://bugs.openjdk.org/browse/JDK-8303415)? I see `vmIntrinsics::is_intrinsic_available` delegates work to `VM_Version::is_intrinsic_supported` [2]. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L3333 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/classfile/vmIntrinsics.cpp#L671 Thanks for catching! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1983456182 From mli at openjdk.org Thu Mar 6 14:36:59 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 6 Mar 2025 14:36:59 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v4] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Thu, 6 Mar 2025 14:30:40 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. >> >> ## Performance >> >> data >> >> Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 >> Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 >> Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 >> Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 >> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 >> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 >> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 >> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.81... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > remove TestFloat16VectorConvChain test for riscv temporariely BTW, I just removed TestFloat16VectorConvChain test for riscv temporariely, as with this patch the ConvF2HF(AddF(ConvHF2F, ConvHF2F)) will be replaced by a ReinterpretHF2S(AddHF(ReinterpretS2HF, ReinterpretS2HF)). In the short future, when we support vectorization of float16 operations, we can enable the test, but for riscv it will still be different form of Nodes, could be something like VectorReinterpretHF2S(VectorAddHF(VectorReinterpretS2HF, VectorReinterpretS2HF)). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23844#issuecomment-2704025799 From epeter at openjdk.org Thu Mar 6 15:07:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Mar 2025 15:07:07 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Thu, 27 Feb 2025 16:38:30 GMT, Galder Zamarre?o wrote: >> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision: >> >> - Merge branch 'master' into topic.intrinsify-max-min-long >> - Fix typo >> - Renaming methods and variables and add docu on algorithms >> - Fix copyright years >> - Make sure it runs with cpus with either avx512 or asimd >> - Test can only run with 256 bit registers or bigger >> >> * Remove platform dependant check >> and use platform independent configuration instead. >> - Fix license header >> - Tests should also run on aarch64 asimd=true envs >> - Added comment around the assertions >> - Adjust min/max identity IR test expectations after changes >> - ... and 34 more: https://git.openjdk.org/jdk/compare/47fdb836...a190ae68 > > Also, I've started a [discussion on jmh-dev](https://mail.openjdk.org/pipermail/jmh-dev/2025-February/004094.html) to see if there's a way to minimise pollution of `Math.min(II)` compilation. As a follow to https://github.com/openjdk/jdk/pull/20098#issuecomment-2684701935 I looked at where the other `Math.min(II)` calls are coming from, and a big chunk seem related to the JMH infrastructure. @galderz about: > Additional performance improvement: make SuperWord recognize more cases as profitble (see Regression 1). Optional. This should already be covered by these, and I will handle that eventually with the Cost-Model RFE [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093): - [JDK-8345044](https://bugs.openjdk.org/browse/JDK-8345044) Sum of array elements not vectorized - (min/max of array) - [JDK-8336000](https://bugs.openjdk.org/browse/JDK-8336000) C2 SuperWord: report that 2-element reductions do not vectorize - You would for example see that on aarch64 machines with only neon/asimd support you can have at most 2 longs per vector, because the max vector length is 128 bits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2704110051 From epeter at openjdk.org Thu Mar 6 15:26:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Mar 2025 15:26:09 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Thu, 27 Feb 2025 16:38:30 GMT, Galder Zamarre?o wrote: >> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision: >> >> - Merge branch 'master' into topic.intrinsify-max-min-long >> - Fix typo >> - Renaming methods and variables and add docu on algorithms >> - Fix copyright years >> - Make sure it runs with cpus with either avx512 or asimd >> - Test can only run with 256 bit registers or bigger >> >> * Remove platform dependant check >> and use platform independent configuration instead. >> - Fix license header >> - Tests should also run on aarch64 asimd=true envs >> - Added comment around the assertions >> - Adjust min/max identity IR test expectations after changes >> - ... and 34 more: https://git.openjdk.org/jdk/compare/dfbb2ee6...a190ae68 > > Also, I've started a [discussion on jmh-dev](https://mail.openjdk.org/pipermail/jmh-dev/2025-February/004094.html) to see if there's a way to minimise pollution of `Math.min(II)` compilation. As a follow to https://github.com/openjdk/jdk/pull/20098#issuecomment-2684701935 I looked at where the other `Math.min(II)` calls are coming from, and a big chunk seem related to the JMH infrastructure. @galderz about: > Additional performance improvement: extend backend capabilities for vectorization (see Regression 2 + 3). Optional. I looked at `src/hotspot/cpu/x86/x86.ad` bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType bt) { 1774 case Op_MaxV: 1775 case Op_MinV: 1776 if (UseSSE < 4 && is_integral_type(bt)) { 1777 return false; 1778 } ... So it seems that here lanewise min/max are supported for AVX2. But it seems that's different for reductions: 1818 case Op_MinReductionV: 1819 case Op_MaxReductionV: 1820 if ((bt == T_INT || is_subword_type(bt)) && UseSSE < 4) { 1821 return false; 1822 } else if (bt == T_LONG && (UseAVX < 3 || !VM_Version::supports_avx512vlbwdq())) { 1823 return false; 1824 } ... So it seems maybe we could improve the AVX2 coverage for reductions. But honestly, I will probably find this issue again once I work on the other reductions above, and run the benchmarks. I think that will make it easier to investigate all of this. I will for example adjust the IR rules, and then it will be apparent where there are cases that are not covered. @galderz you said you would add some extra comments, then I will review again :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2704159992 PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2704161929 From gziemski at openjdk.org Thu Mar 6 15:27:04 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 6 Mar 2025 15:27:04 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag [v2] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 14:22:38 GMT, Afshin Zafari wrote: >> With the `size` parameter there will be no need to traverse/go through the nodes between the base and end of the region. >> Tests: >> linux-x64-debug, gtest:NMT* and runtime/NMT* > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > ReservedSpace is accepted as param. LGTM, thank you for fixing this. Need to fix the build errors: /home/runner/work/jdk/jdk/src/hotspot/share/nmt/memTracker.hpp:224:31: error: invalid use of incomplete type ?const class ReservedSpace? 224 | record_virtual_memory_tag(rs.base(), rs.size(), mem_tag); | ^~ In file included from /home/runner/work/jdk/jdk/src/hotspot/share/memory/allocation.cpp:28: /home/runner/work/jdk/jdk/src/hotspot/share/memory/metaspace.hpp:38:7: note: forward declaration of ?class ReservedSpace? 38 | class ReservedSpace; | ^~~~~~~~~~~~~ In file included from /home/runner/work/jdk/jdk/src/hotspot/share/memory/allocation.cpp:30: /home/runner/work/jdk/jdk/src/hotspot/share/nmt/memTracker.hpp:224:42: error: invalid use of incomplete type ?const class ReservedSpace? 224 | record_virtual_memory_tag(rs.base(), rs.size(), mem_tag); | ^~ In file included from /home/runner/work/jdk/jdk/src/hotspot/share/memory/allocation.cpp:28: /home/runner/work/jdk/jdk/src/hotspot/share/memory/metaspace.hpp:38:7: note: forward declaration of ?class ReservedSpace? ... (rest of output omitted) ------------- Marked as reviewed by gziemski (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23770#pullrequestreview-2664792545 PR Comment: https://git.openjdk.org/jdk/pull/23770#issuecomment-2704168962 From tschatzl at openjdk.org Thu Mar 6 15:39:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 6 Mar 2025 15:39:57 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: <4um7PHAs89PIoa3QgbkPx-8Jx9vHiYr7afFQGOtFTY8=.f1ca8bad-0827-4f8c-852d-0fc82ffd546a@github.com> On Tue, 4 Mar 2025 15:33:29 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> iwalulya review >> * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement >> * predicate for determining whether the refinement has been disabled >> * some other typos/comment improvements >> * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming > > src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 219: > >> 217: // The young gen revising mechanism reads the predictor and the values set >> 218: // here. Avoid inconsistencies by locking. >> 219: MutexLocker x(G1RareEvent_lock, Mutex::_no_safepoint_check_flag); > > Who else can be in this critical-section? I don't get what this lock is protecting us from. Actually further discussion with @albertnetymk showed that this change introduces an unintended behaviorial change where since the refinement control thread is also responsible for updating the current young gen length. It means that the mutex isn't required. However this means that while the refinement is running this is not done any more, because refinement can take seconds, I need to move this work to another thread (probably the `G1ServiceThread?). I will add a separate mutex then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1983587293 From tschatzl at openjdk.org Thu Mar 6 16:13:02 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 6 Mar 2025 16:13:02 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v13] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 10:41:02 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * fix whitespace >> * additional whitespace between log tags >> * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename > > src/hotspot/share/gc/g1/g1ThreadLocalData.hpp line 29: > >> 27: #include "gc/g1/g1BarrierSet.hpp" >> 28: #include "gc/g1/g1CardTable.hpp" >> 29: #include "gc/g1/g1CollectedHeap.hpp" > > probably does not need to be included `g1CardTable.hpp` needed because of `G1CardTable::CardValue` I think. I removed the 'G1CollectedHeap` include though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1983655594 From ihse at openjdk.org Thu Mar 6 16:21:54 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 6 Mar 2025 16:21:54 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:52:16 GMT, Aleksey Shipilev wrote: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) make/autoconf/platform.m4 line 669: > 667: AC_ARG_ENABLE(deprecated-ports, [AS_HELP_STRING([--enable-deprecated-ports@<:@=yes/no@:>@], > 668: [Suppress the error when configuring for a deprecated port @<:@no@:>@])]) > 669: # There are no deprecated ports. This option is left to be consistent with future deprecations. Please remove. Old code is always present in git history if you want to reuse it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23906#discussion_r1983670151 From tschatzl at openjdk.org Thu Mar 6 16:26:31 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 6 Mar 2025 16:26:31 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v14] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * iwalulya review * renaming * fix some includes, forward declaration ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/a457e6e7..350a4fa3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=12-13 Stats: 31 lines in 13 files changed: 1 ins; 2 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From jbhateja at openjdk.org Thu Mar 6 16:38:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 6 Mar 2025 16:38:57 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v4] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 13:10:34 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Added alignment to loop entries. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 2: > 1: /* > 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. Please update copyright year src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 96: > 94: StubRoutines::_dilithiumMontMulByConstant = generate_dilithiumMontMulByConstant_avx512(); > 95: StubRoutines::_dilithiumDecomposePoly = generate_dilithiumDecomposePoly_avx512(); > 96: } Indentation fix needed src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 362: > 360: const Register roundsLeft = r11; > 361: > 362: __ align(OptoLoopAlignment); Redundant alignment before label should be before it's bind ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983463096 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983464620 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983477681 From shade at openjdk.org Thu Mar 6 16:40:58 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 6 Mar 2025 16:40:58 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 16:18:50 GMT, Magnus Ihse Bursie wrote: >> This PR implements JEP 503: Remove the 32-bit x86 Port. >> >> The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. >> >> This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. >> >> The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. >> >> Additional testing: >> - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) >> - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) > > make/autoconf/platform.m4 line 669: > >> 667: AC_ARG_ENABLE(deprecated-ports, [AS_HELP_STRING([--enable-deprecated-ports@<:@=yes/no@:>@], >> 668: [Suppress the error when configuring for a deprecated port @<:@no@:>@])]) >> 669: # There are no deprecated ports. This option is left to be consistent with future deprecations. > > Please remove. Old code is always present in git history if you want to reuse it. I don't mind removing it, my concern would be to _remember_ this option was there! I guess it is okay to re-re-invent it later, possibly under a different name, when the next port gets deprecated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23906#discussion_r1983704213 From duke at openjdk.org Thu Mar 6 17:37:33 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 6 Mar 2025 17:37:33 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v5] In-Reply-To: References: Message-ID: <3bphXKLpIpxAZP-FEOeob6AaHbv0BAoEceJka64vMW8=.3e4f74e0-9479-4926-b365-b08d8d702692@github.com> > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Accepted review comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23860/files - new: https://git.openjdk.org/jdk/pull/23860/files/3aaa106f..64135f29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=03-04 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From cslucas at openjdk.org Thu Mar 6 18:24:34 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 6 Mar 2025 18:24:34 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v8] In-Reply-To: References: Message-ID: > In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. > > The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. > > The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. > > Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. > > The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. > > Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. > > Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. Cesar Soares Lucas has updated the pull request incrementally with two additional commits since the last revision: - Revert changes to shenandoahHeap.cpp - Address PR feedback: moar clean-up. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23170/files - new: https://git.openjdk.org/jdk/pull/23170/files/046ea8a0..0262b7df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23170&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23170&range=06-07 Stats: 29 lines in 4 files changed: 5 ins; 18 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23170.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23170/head:pull/23170 PR: https://git.openjdk.org/jdk/pull/23170 From cslucas at openjdk.org Thu Mar 6 18:24:34 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 6 Mar 2025 18:24:34 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v5] In-Reply-To: References: <2ZFtKLn2EcbzjKQ_USb3yiOWEWQJYocFwj_rk-5h0Jg=.f4eec566-3e0c-4a75-8c27-2cb785b0081a@github.com> Message-ID: On Wed, 5 Mar 2025 17:45:19 GMT, Aleksey Shipilev wrote: >> Yes, that's for the VMThread. That seems like a good question. I > > Actually, I am wondering why this is needed. It looks to me VMThread attaches after heap initialization, and the normal `ShenandoahBarrierSet::on_thread_attach` should handle it. You're right, we didn't need that anymore. I removed + test it and we're good. I pushed a commit removing that code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1983853294 From cslucas at openjdk.org Thu Mar 6 18:24:34 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 6 Mar 2025 18:24:34 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v6] In-Reply-To: <_LIv8Ggp3ukK0HmhknyG_Mz2x5OKs63Y-qSXTQo9Gdo=.9efc86f1-6cc4-425b-9319-5e1500eb59da@github.com> References: <_LIv8Ggp3ukK0HmhknyG_Mz2x5OKs63Y-qSXTQo9Gdo=.9efc86f1-6cc4-425b-9319-5e1500eb59da@github.com> Message-ID: On Wed, 5 Mar 2025 17:32:30 GMT, Aleksey Shipilev wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Address PR feedback: formatting. > > src/hotspot/share/gc/shenandoah/shenandoahCardTable.cpp line 57: > >> 55: _byte_map = (CardValue*) write_space.base(); >> 56: _byte_map_base = _byte_map - (uintptr_t(low_bound) >> _card_shift); >> 57: > > It is a bit sad to see these asserts go. Is this because `_byte_map` is now mutable? May I suggest doing something like: > > > _write_byte_map = (CardValue*) write_space.base(); > _write_byte_map_base = _byte_map - (uintptr_t(low_bound) >> _card_shift); > ...later... > _read_byte_map = (CardValue*) read_space.base(); > _read_byte_map_base = _byte_map - (uintptr_t(low_bound) >> _card_shift); > ...later... > > // Set up current byte map > _byte_map = _write_byte_map; > _byte_map_base = _write_byte_map_base; > > // Check one side is good > assert(byte_for(low_bound) == &_byte_map[0], "Checking start of map"); > assert(byte_for(high_bound-1) <= &_byte_map[last_valid_index()], "Checking end of map"); > swap_read_and_write_tables(); > > // Check another side is good > assert(byte_for(low_bound) == &_byte_map[0], "Checking start of map"); > assert(byte_for(high_bound-1) <= &_byte_map[last_valid_index()], "Checking end of map"); > swap_read_and_write_tables(); @shipilev - I did some tests and the conclusion is that we can put the asserts back. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1983847384 From ihse at openjdk.org Thu Mar 6 18:25:54 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 6 Mar 2025 18:25:54 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 16:38:13 GMT, Aleksey Shipilev wrote: >> make/autoconf/platform.m4 line 669: >> >>> 667: AC_ARG_ENABLE(deprecated-ports, [AS_HELP_STRING([--enable-deprecated-ports@<:@=yes/no@:>@], >>> 668: [Suppress the error when configuring for a deprecated port @<:@no@:>@])]) >>> 669: # There are no deprecated ports. This option is left to be consistent with future deprecations. >> >> Please remove. Old code is always present in git history if you want to reuse it. > > I don't mind removing it, my concern would be to _remember_ this option was there! I guess it is okay to re-re-invent it later, possibly under a different name, when the next port gets deprecated. It's no that important, no. I'm not sure if previous deprecated ports were handles exactly like this. And you can always do like `git log | grep -i "remove .* port"` to find the change it was removed in, and look what it did... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23906#discussion_r1983855800 From jsjolen at openjdk.org Thu Mar 6 18:49:30 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Mar 2025 18:49:30 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v35] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 6 Mar 2025 14:26:56 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > review comments applied A few more :-). Btw, you probably do not want to integrate any of your other NMT PRs (unless they are necessary). Integrating them might cause merge conflicts. I think we're almost done!!! Great job on this :D src/hotspot/share/nmt/virtualMemoryTracker.cpp line 107: > 105: // str, NMTUtil::tag_to_name(tag), (long)reserve_delta, (long)commit_delta, reserved, committed); > 106: }; > 107: 8350567 is merged now! I think that that PR should be merged in. src/hotspot/share/nmt/virtualMemoryTracker.cpp line 195: > 193: bool VirtualMemoryTracker::print_containing_region(const void* p, outputStream* st) { > 194: ReservedMemoryRegion rmr = tree()->find_reserved_region((address)p); > 195: log_debug(nmt)("containing rgn: base=" INTPTR_FORMAT, p2i(rmr.base())); Is this important? src/hotspot/share/nmt/virtualMemoryTracker.cpp line 216: > 214: MemTracker::NmtVirtualMemoryLocker nvml; > 215: tree()->visit_reserved_regions([&](ReservedMemoryRegion& rgn) { > 216: log_info(nmt)("region in walker vmem, base: " INTPTR_FORMAT " size: %zu , %s, committed: %zu", This should be `debug` level, or maybe even removed. src/hotspot/share/nmt/virtualMemoryTracker.hpp line 35: > 33: #include "utilities/ostream.hpp" > 34: > 35: // VirtualMemoryTracker (VMT) is the internal class of NMT that only the MemTracker class uses it for performing the NMT operations. `... uses it for ...`, delete "it" (grammar issue). src/hotspot/share/nmt/virtualMemoryTracker.hpp line 41: > 39: // state (reserved/released/committed) and MemTag of the regions before and after it. > 40: // > 41: // The memory operations of Reserve/Commit/Uncommit/Release (RCUR) are tracked by updating/inserting/deleting the nodes in the tree. When an operation `(RCUR)` can be removed, it's never mentioned again. src/hotspot/share/nmt/virtualMemoryTracker.hpp line 49: > 47: // - uncommitted size of a MemTag should be <= of its committed size > 48: // - released size of a MemTag should be <= of its reserved size > 49: I don't believe that these are checked, right? So this can be deleted. src/hotspot/share/nmt/virtualMemoryTracker.hpp line 132: > 130: } > 131: } > 132: Dead code? ------------- PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2665310971 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1983875040 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1983878610 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1983879989 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1983870666 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1983869375 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1983868809 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1983866253 From wkemper at openjdk.org Thu Mar 6 19:02:49 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Mar 2025 19:02:49 GMT Subject: RFR: 8350905: Releasing a WeakHandle's referent may extend its lifetime Message-ID: When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. We are proposing that native weak references are cleared with an additional `AS_NO_KEEPALIVE` decorator. This is similar to what was done for j.l.r.WeakReference in [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696). # Testing GHA, `hotspot_gc_shenandoah`. Additionally, for G1, ZGC, and Shenandoah we've run Extremem, Dacapo, SpecJVM2008, SpecJBB2015, Heapothesys and Diluvian. All executions completed without errors. ------------- Commit messages: - Also, no keep alive when releasing string dedup reference - Use AS_NO_KEEPALIVE when clearing weak referent Changes: https://git.openjdk.org/jdk/pull/23935/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23935&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350905 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23935.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23935/head:pull/23935 PR: https://git.openjdk.org/jdk/pull/23935 From mgronlun at openjdk.org Thu Mar 6 19:44:54 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 6 Mar 2025 19:44:54 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 13:17:42 GMT, Aleksey Shipilev wrote: >> We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. >> >> Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Only emit event when notification happened > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Rewrite test to RecordingStream > - Drop threshold to 0ms > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Disable by default > - Fix Unsure if this event type carries enough weight. The JavaMonitorEvent already has a notified field: Generally, any event that cannot be enabled by default needs good motivations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23901#issuecomment-2704778898 From cslucas at openjdk.org Thu Mar 6 19:45:21 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 6 Mar 2025 19:45:21 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v9] In-Reply-To: References: Message-ID: > In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. > > The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. > > The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. > > Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. > > The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. > > Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. > > Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Fix build: no shenandoah on arm32. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23170/files - new: https://git.openjdk.org/jdk/pull/23170/files/0262b7df..0a540c79 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23170&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23170&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23170.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23170/head:pull/23170 PR: https://git.openjdk.org/jdk/pull/23170 From shade at openjdk.org Thu Mar 6 19:49:58 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 6 Mar 2025 19:49:58 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v9] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 19:45:21 GMT, Cesar Soares Lucas wrote: >> In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. >> >> The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. >> >> The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. >> >> Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. >> >> The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. >> >> Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. >> >> Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix build: no shenandoah on arm32. Looks fine now, thanks! I have not looked deeply at card table lifecycle, so I rely on @kdnilsen and @earthling-amzn reviews here. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23170#pullrequestreview-2665465682 From duke at openjdk.org Thu Mar 6 22:20:04 2025 From: duke at openjdk.org (duke) Date: Thu, 6 Mar 2025 22:20:04 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v9] In-Reply-To: References: Message-ID: <8TjLe-qgfKrkvUOoUUq5rDeYMYXxQt_isizNgOBsiJg=.ebb05e71-9ee6-4024-a12e-5c7ed8bd6b5f@github.com> On Thu, 6 Mar 2025 19:45:21 GMT, Cesar Soares Lucas wrote: >> In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. >> >> The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. >> >> The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. >> >> Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. >> >> The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. >> >> Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. >> >> Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix build: no shenandoah on arm32. @JohnTortugo Your change (at version 0a540c79584f28fe90d128977f5121467f59626b) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23170#issuecomment-2705065887 From vladimir.kozlov at oracle.com Thu Mar 6 22:41:15 2025 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 6 Mar 2025 14:41:15 -0800 Subject: [External] : RFD: Grouping hot code in CodeCache In-Reply-To: <1B0C3138-761B-4DB0-8A98-977C6FC40178@amazon.co.uk> References: <1B0C3138-761B-4DB0-8A98-977C6FC40178@amazon.co.uk> Message-ID: <2623f909-bb91-4450-bc05-d9181ba3abcb@oracle.com> Hi Evgeny, My concern is that it will complicate VM existing code for not significant benefits in real production environment. What improvements your experiments in real production runs shows? And which JDK version you used for that? As you know most of nmethod's metadata is moved from CodeCache. And Boris Ulasevich will move the final part (relocation info) soon. After that the code will be a lot more compact in CodeCache. Code sparsity should be less issue then. It would be nice if you redo your production experiments after that. I understand that we can still have sparsity due to "warm" nmethods and C1 compiled code mixed with "hot" C2 nmethods. I think compilation policy has heuristic to detect "warm" method (time intervals between invocations). Can we simply use a separate CodeCache's segment for all C2 "hot" (we can specify frequency flag to determine what "hot" means) methods regardless when they are compiled. Then you don't need to create list or do anything special for them. Most likely we will waste more space in CodeCache but it could be conditional under flag which you already proposed in separate segment RFE. Thanks, Vladimir K On 3/5/25 10:41 AM, Astigeevich, Evgeny wrote: > Hi Vladimir, > > This is JDK-8326205: Implement grouping hot nmethods in CodeCache. > > As I managed to synthesize a benchmark (https://github.com/openjdk/jdk/ > pull/23831 pull/23831__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- > baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- > imnfmmpfw$>) to demonstrate performance impact of sparse code, I?d like > to discuss a possible solution of the sparse code. > > High level, a solution is: > > * Detect hot code. > * Group hot code. > * Maintain grouped code. > > Downstream we tried two approaches: > > * *Static lists of methods (compile command):* Identify frequently > used (hot) methods using test runs and provide static method lists > to JVM in production. When JVM compiles a Java method and the method > is on the list, JVM puts the code into to a designated code heap > (HotCodeHeap). > * *Dynamic lists of methods (compiler directives):* Profile an > application in production and dynamically relocate identified hot > methods to HotCodeHeap. Relocation was implemented with recompilation. > > The main advantage of static lists is zero profiling overhead in > production. We do all profiling and analysis in test runs. Its problems are: > > * *Training Run Accuracy*: We need training runs to have execution > paths closely mimicking production environments. Otherwise we put > wrong methods into HotCodeHeap. > * *Method List Maintenance:* We need to rerun training to regenerate > lists when application code changes. Training runs are expensive and > time-consuming. They require long runs to guarantee we see all major > execution paths. Updating lists in production can be as complex as > application deployment > * *Method Placement Limitations:* Methods marked for HotCodeHeap are > permanently placed into HotCodeHeap. No mechanism to remove methods > that become less frequently used. > > We addressed these problems with dynamic lists of methods. We > implemented a Java agent that runs within the same JVM to dynamically > detect and manage hot Java methods without prior method identification. > The agent detects hot methods using JFR. The agent manages hot Java > methods in HotCodeHeap with compiler directives. A new compiler > directive marks methods with dynamic states ("hot" or "cold"). Methods > marked by the ?hot? state are recompiled and placed in HotCodeHeap. > Methods marked by the ?cold? state are eventually removed from HotCodeHeap. > > Problems of this approach are: > > * It requires specific, complex modifications to compiler directive > support: recompilation of Java methods affected by compiler > directives changes. This functionality is unique to Java agent > implementation and has limited potential for broader use. > * The agent cannot guarantee Java methods are moved to/removed from > the HotCodeHeap because updates of compiler directives can fail. > * The agent knows nothing about compiled code, e.g. whether it?s C1 or > C2 compiled, code size, profile. This data can useful for deciding > to move or not to move to HotCodeHeap. > * Recompilations, especially C2, are expensive. Having many of them > can cause performance issues. Also recompiled code might differ from > the code we have detected as ?hot?. > > Running these two approaches in production we learned: > > * We detect 95% of actively used methods withing the first 30 minutes > of an application run. This is with JFR profiling configured: 90 > seconds session duration, sampling each 11 ms, 8 minutes between > profiling sessions. We can find actively used methods faster if we > reduce a pause between profiling sessions and sampling period. > However it will increase the profiling overhead and affect > application performance. With the current configuration, the > profiling overhead is between 1% - 2%. > * A set of actively used methods gets into the steady state (no new > methods added to, no methods removed from) within the first 60 minutes. > * Static lists, when created from runs close to production, have 80% - > 90% methods always in use. This does not change over time. > * Predicting the size of HotCodeHeap is difficult, especially with > dynamic lists. > > We want to have grouping of hot method functionality as a part Hotspot > JVM. We will group only C2 compiled methods. We can group JVMCI compiled > methods, e.g. Graal, if needed. We need profiling precise enough to > detect major Java methods. Low overhead is more important than precision. > > We think we can have a solution which does not require a lot of code: > > * Detect hot code: we can an implementation based on the Sweeper: > https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/ > runtime/sweeper.hpp openjdk/jdk17u/blob/master/src/hotspot/share/runtime/ > sweeper.hpp__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- > baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- > imVr_axpo$>. We will use the handshakes mechanism, what the Sweeper > used, to detect nmethods on the top of thread stacks. > * Group hot code: we have a draft PR https://github.com/openjdk/jdk/ > pull/23573 jdk/pull/23573__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- > baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- > imcL9xtiE$>. It implements relocation of nmethods within CodeCache. > * Maintain grouped code: we will add an additional code heap where hot > nmethods will be relocated to. > > What do you think about this approach? Are there other possible solutions? > > Thanks, > > Evgeny A. > > > > > Amazon Development Centre (London) Ltd.Registered in England and Wales > with registration number 04543232 with its registered office at 1 > Principal Place, Worship Street, London EC2A 2FA, United Kingdom. > > From dlong at openjdk.org Thu Mar 6 23:05:54 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 6 Mar 2025 23:05:54 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v9] In-Reply-To: References: Message-ID: <5c6SGfHaHxRjaI_g_bkA-Q6X_owRkd6bUNw98lbv_is=.8476ced5-b8e0-4cf2-867a-1b3776dce3ab@github.com> On Wed, 5 Mar 2025 21:26:08 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > removed template paramter and moved ptr can_align_up Do we really want to allow passing nullptr to can_align_up(void* ptr, A alignment)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23711#issuecomment-2705132291 From fyang at openjdk.org Fri Mar 7 01:44:54 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 7 Mar 2025 01:44:54 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v4] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Thu, 6 Mar 2025 14:30:40 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. >> >> ## Performance >> >> data >> >> Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 >> Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 >> Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 >> Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 >> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 >> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 >> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 >> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.81... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > remove TestFloat16VectorConvChain test for riscv temporariely Several more comments after a closer look. src/hotspot/cpu/riscv/assembler_riscv.hpp line 339: > 337: single_precision, > 338: double_precision > 339: }; Seems better to move this enum to file `macroAssembler_riscv.hpp`? It's only used in the macro assember routines. src/hotspot/cpu/riscv/riscv.ad line 8254: > 8252: // half precision operations > 8253: > 8254: instruct reinterpretS2HF(fRegF dst, iRegINoSp src) Suggestion: `instruct reinterpretS2HF(fRegF dst, iRegI src)` src/hotspot/cpu/riscv/riscv.ad line 8257: > 8255: %{ > 8256: match(Set dst (ReinterpretS2HF src)); > 8257: effect(TEMP_DEF dst); I don't see why this `TEMP_DEF dst` effect is needed. Maybe we should remove it from all the newly-added instructs. src/hotspot/cpu/riscv/vm_version_riscv.cpp line 474: > 472: case vmIntrinsics::_floatToFloat16: > 473: case vmIntrinsics::_float16ToFloat: > 474: if (!supports_float16_to_float()) { As the `supports_float16_to_float` function name doesn't reflect the `_floatToFloat16` case, I suggest to directly inline the code here: `return UseZfh || UseZfhmin;` ------------- PR Review: https://git.openjdk.org/jdk/pull/23844#pullrequestreview-2665957697 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1984266099 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1984269266 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1984275746 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1984256375 From fyang at openjdk.org Fri Mar 7 02:01:53 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 7 Mar 2025 02:01:53 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v4] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Thu, 6 Mar 2025 14:34:04 GMT, Hamlin Li wrote: > BTW, I just removed TestFloat16VectorConvChain test for riscv temporariely, as with this patch the ConvF2HF(AddF(ConvHF2F, ConvHF2F)) will be replaced by a ReinterpretHF2S(AddHF(ReinterpretS2HF, ReinterpretS2HF)). Maybe we should also update the `@requires` of the test at the same time? Currently, it says `| (os.arch == "riscv64" & vm.cpu.features ~= ".*zvfh.*")`. Maybe we change `zvfh` into `zfh`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23844#issuecomment-2705343169 From galder at openjdk.org Fri Mar 7 06:19:03 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 7 Mar 2025 06:19:03 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v14] In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: <9c34YjVjK0BMclNqFWMSitBV2YTcu_jmgWVitjRgvF0=.0f225af6-5888-4160-9a54-09baa696da1c@github.com> > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 47 additional commits since the last revision: - Merge branch 'master' into topic.intrinsify-max-min-long - Add assertion comments - Add simple reduction benchmarks on top of multiply ones - Merge branch 'master' into topic.intrinsify-max-min-long - Fix typo - Renaming methods and variables and add docu on algorithms - Fix copyright years - Make sure it runs with cpus with either avx512 or asimd - Test can only run with 256 bit registers or bigger * Remove platform dependant check and use platform independent configuration instead. - Fix license header - ... and 37 more: https://git.openjdk.org/jdk/compare/a328e466...1aa690d3 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20098/files - new: https://git.openjdk.org/jdk/pull/20098/files/d0e793a3..1aa690d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=12-13 Stats: 65249 lines in 2144 files changed: 33401 ins; 21691 del; 10157 mod Patch: https://git.openjdk.org/jdk/pull/20098.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20098/head:pull/20098 PR: https://git.openjdk.org/jdk/pull/20098 From galder at openjdk.org Fri Mar 7 06:19:04 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 7 Mar 2025 06:19:04 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v4] In-Reply-To: <9ReqLUCZ6XDaSQxgYw3NyZZdMv3SOHkCkzJ0DLAksas=.8cb29982-8cb8-4068-a251-59a189c83b93@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <9ReqLUCZ6XDaSQxgYw3NyZZdMv3SOHkCkzJ0DLAksas=.8cb29982-8cb8-4068-a251-59a189c83b93@github.com> Message-ID: On Tue, 17 Dec 2024 16:40:01 GMT, Galder Zamarre?o wrote: >> test/hotspot/jtreg/compiler/intrinsics/math/TestMinMaxInlining.java line 80: >> >>> 78: @IR(phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "1" }) >>> 79: @IR(phase = { CompilePhase.AFTER_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "0" }) >>> 80: private static long testLongMin(long a, long b) { >> >> Can you add a comment why it disappears after macro expansion? > > ~Good question. On non-avx512 machines after macro expansion the min/max nodes become cmov nodes, but but that's not the full story because on avx512 machines, they become minV/maxV nodes. Would you tweak the `@IR` annotations to capture this? Or would you leave it just as a comment?~ > > Scratch that, this is not a test for arrays, so no minV/maxV nodes. I'll just add a comment. I've added a comment ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1984510490 From galder at openjdk.org Fri Mar 7 06:19:04 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 7 Mar 2025 06:19:04 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Thu, 6 Mar 2025 15:22:18 GMT, Emanuel Peter wrote: >> Also, I've started a [discussion on jmh-dev](https://mail.openjdk.org/pipermail/jmh-dev/2025-February/004094.html) to see if there's a way to minimise pollution of `Math.min(II)` compilation. As a follow to https://github.com/openjdk/jdk/pull/20098#issuecomment-2684701935 I looked at where the other `Math.min(II)` calls are coming from, and a big chunk seem related to the JMH infrastructure. > > @galderz you said you would add some extra comments, then I will review again :) @eme64 I've added the comment that was pending from your last review. I've also merged latest master. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2705620662 From epeter at openjdk.org Fri Mar 7 06:48:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 06:48:05 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v14] In-Reply-To: <9c34YjVjK0BMclNqFWMSitBV2YTcu_jmgWVitjRgvF0=.0f225af6-5888-4160-9a54-09baa696da1c@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <9c34YjVjK0BMclNqFWMSitBV2YTcu_jmgWVitjRgvF0=.0f225af6-5888-4160-9a54-09baa696da1c@github.com> Message-ID: On Fri, 7 Mar 2025 06:19:03 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 47 additional commits since the last revision: > > - Merge branch 'master' into topic.intrinsify-max-min-long > - Add assertion comments > - Add simple reduction benchmarks on top of multiply ones > - Merge branch 'master' into topic.intrinsify-max-min-long > - Fix typo > - Renaming methods and variables and add docu on algorithms > - Fix copyright years > - Make sure it runs with cpus with either avx512 or asimd > - Test can only run with 256 bit registers or bigger > > * Remove platform dependant check > and use platform independent configuration instead. > - Fix license header > - ... and 37 more: https://git.openjdk.org/jdk/compare/99572e4c...1aa690d3 Looks good, thanks for all the updates :) I'm launching another round of testing on our side ;) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20098#pullrequestreview-2666394529 PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2705659841 From dholmes at openjdk.org Fri Mar 7 07:20:58 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 7 Mar 2025 07:20:58 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: <5nkWE-TpdoNk-k_5JE7MopX5_KJf6DjjLWMADxWr29k=.ee34fa19-882c-4731-86f6-bdaed2a6e276@github.com> Message-ID: On Thu, 6 Mar 2025 09:48:47 GMT, Aleksey Shipilev wrote: > After this PR integrates, it is not possible to build x86_32 You could add a couple of lines to the build code and it would not be possible to build 32-bit, so that is a necessary but not sufficient condition to claim to implement the JEP IMO. I'm not looking for one big PR, I'm looking for multiple PR's as proposed but which all fall under the JEP umbrella. Until the JEP is targeted then nothing can be integrated anyway. This is what, I thought, dependent PR's were designed for. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2705707504 From dholmes at openjdk.org Fri Mar 7 07:23:54 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 7 Mar 2025 07:23:54 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 18:23:24 GMT, Magnus Ihse Bursie wrote: >> I don't mind removing it, my concern would be to _remember_ this option was there! I guess it is okay to re-re-invent it later, possibly under a different name, when the next port gets deprecated. > > It's no that important, no. I'm not sure if previous deprecated ports were handles exactly like this. > > And you can always do like `git log | grep -i "remove .* port"` to find the change it was removed in, and look what it did... I think leaving a comment describing how to deprecate a port is useful. To look it up in history you have to realise there is something to look up. "They who are not reminded of the past will invent a new way to do it in the future." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23906#discussion_r1984572816 From azafari at openjdk.org Fri Mar 7 08:05:29 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 7 Mar 2025 08:05:29 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v36] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: more reviews. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/f2f1a800..cfab60fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=34-35 Stats: 14 lines in 2 files changed: 0 ins; 12 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From azafari at openjdk.org Fri Mar 7 08:40:26 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 7 Mar 2025 08:40:26 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v37] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 82 commits: - Merge remote-tracking branch 'origin/master' into _8337217_nmt_VMT_with_tree - more reviews. - review comments applied - test cases for doing reserve or commit the same region twice. - style, some cleanup, VMT and regionsTree circular dep resolved - removed UseFlagInPlace test. - reviews applied. - test file got back, fixed coding style - once more. - removed remaining of the unrelated changes. - ... and 72 more: https://git.openjdk.org/jdk/compare/7314efc9...5177cc11 ------------- Changes: https://git.openjdk.org/jdk/pull/20425/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=36 Stats: 1450 lines in 26 files changed: 582 ins; 544 del; 324 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From azafari at openjdk.org Fri Mar 7 08:40:26 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 7 Mar 2025 08:40:26 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v35] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 6 Mar 2025 18:38:20 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments applied > > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 107: > >> 105: // str, NMTUtil::tag_to_name(tag), (long)reserve_delta, (long)commit_delta, reserved, committed); >> 106: }; >> 107: > > 8350567 is merged now! I think that that PR should be merged in. Under test. > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 195: > >> 193: bool VirtualMemoryTracker::print_containing_region(const void* p, outputStream* st) { >> 194: ReservedMemoryRegion rmr = tree()->find_reserved_region((address)p); >> 195: log_debug(nmt)("containing rgn: base=" INTPTR_FORMAT, p2i(rmr.base())); > > Is this important? Removed. > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 216: > >> 214: MemTracker::NmtVirtualMemoryLocker nvml; >> 215: tree()->visit_reserved_regions([&](ReservedMemoryRegion& rgn) { >> 216: log_info(nmt)("region in walker vmem, base: " INTPTR_FORMAT " size: %zu , %s, committed: %zu", > > This should be `debug` level, or maybe even removed. Removed. > src/hotspot/share/nmt/virtualMemoryTracker.hpp line 35: > >> 33: #include "utilities/ostream.hpp" >> 34: >> 35: // VirtualMemoryTracker (VMT) is the internal class of NMT that only the MemTracker class uses it for performing the NMT operations. > > `... uses it for ...`, delete "it" (grammar issue). Removed. > src/hotspot/share/nmt/virtualMemoryTracker.hpp line 41: > >> 39: // state (reserved/released/committed) and MemTag of the regions before and after it. >> 40: // >> 41: // The memory operations of Reserve/Commit/Uncommit/Release (RCUR) are tracked by updating/inserting/deleting the nodes in the tree. When an operation > > `(RCUR)` can be removed, it's never mentioned again. Removed. > src/hotspot/share/nmt/virtualMemoryTracker.hpp line 49: > >> 47: // - uncommitted size of a MemTag should be <= of its committed size >> 48: // - released size of a MemTag should be <= of its reserved size >> 49: > > I don't believe that these are checked, right? So this can be deleted. As said at the end of line, when they are applied to the VirtualMemorySummary they will be checked. Is it OK if I limit/wrap the comments to 80-columns text? > src/hotspot/share/nmt/virtualMemoryTracker.hpp line 132: > >> 130: } >> 131: } >> 132: > > Dead code? Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1984660439 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1984660659 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1984660818 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1984660197 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1984659994 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1984659801 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1984657977 From rcastanedalo at openjdk.org Fri Mar 7 08:49:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 7 Mar 2025 08:49:57 GMT Subject: RFR: 8346194: Improve G1 pre-barrier C2 cost estimate [v2] In-Reply-To: References: Message-ID: <9rZfd8Ncob8mKPrPNAUXYgd16GhvWF-TEBcKVa60isE=.477a43e6-3aec-4cea-b943-6c8ea157a7d1@github.com> On Thu, 6 Mar 2025 10:35:07 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that modifies pre-barrier node costs for loop-unrolling to only consider the fast path. The reasoning is similar to zgc (and the new costs as well): only the part of the barrier inlined into the main code stream, as the slow path is laid out separately and does/should not directly affect performance (particularly if there is no marking going on). >> >> There are no differences/impact in performance since the post barrier cost is still very large, which fill be fixed elsewhere. >> >> Testing: gha, perf testing standalone (neither micros nor actual benchmarks give any difference outside of variance), testing with JDK-8342382 >> >> Hth, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp > > Co-authored-by: Roberto Casta?eda Lozano Looks good, thanks for addressing my feedback Thomas! Reducing the estimated GC barrier size could lead to over-unrolling, which might increase code cache pressure and ultimately affect performance. In practice, it seems that the effect of estimated GC barrier size on total code size is limited though: I studied the impact on C2-generated code size using DaCapo 23 on x64 and aarch64 and it is basically unaffected by this change. In the extreme case of estimating GC barrier size to be 0, the overall code size increase is of about 1% for x64 and 0.5% for aarch64. If each GC barrier (pre and post) is estimated to correspond to 20 nodes, the code size increase is further reduced to only 0.5% for x64 and 0.01% for aarch64. Beyond that, the code size increase becomes statistically insignificant. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23862#pullrequestreview-2666612989 From galder at openjdk.org Fri Mar 7 09:23:06 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 7 Mar 2025 09:23:06 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v14] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <9c34YjVjK0BMclNqFWMSitBV2YTcu_jmgWVitjRgvF0=.0f225af6-5888-4160-9a54-09baa696da1c@github.com> Message-ID: On Fri, 7 Mar 2025 06:44:57 GMT, Emanuel Peter wrote: >> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 47 additional commits since the last revision: >> >> - Merge branch 'master' into topic.intrinsify-max-min-long >> - Add assertion comments >> - Add simple reduction benchmarks on top of multiply ones >> - Merge branch 'master' into topic.intrinsify-max-min-long >> - Fix typo >> - Renaming methods and variables and add docu on algorithms >> - Fix copyright years >> - Make sure it runs with cpus with either avx512 or asimd >> - Test can only run with 256 bit registers or bigger >> >> * Remove platform dependant check >> and use platform independent configuration instead. >> - Fix license header >> - ... and 37 more: https://git.openjdk.org/jdk/compare/bc67ede6...1aa690d3 > > I'm launching another round of testing on our side ;) @eme64 I've run tier[1-3] locally and looked good overall. I had to update jtreg and noticed this failure but I don't think it's related to this PR: java.lang.AssertionError: gtest execution failed; exit code = 2. the failed tests: [codestrings::validate_vm] at GTestWrapper.main(GTestWrapper.java:98) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) at java.base/java.lang.Thread.run(Thread.java:1447) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2705937075 From azafari at openjdk.org Fri Mar 7 09:36:42 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 7 Mar 2025 09:36:42 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag [v3] In-Reply-To: References: Message-ID: > With the `size` parameter there will be no need to traverse/go through the nodes between the base and end of the region. > Tests: > linux-x64-debug, gtest:NMT* and runtime/NMT* Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: fixed build problem. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23770/files - new: https://git.openjdk.org/jdk/pull/23770/files/1e7853e6..87f22f46 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23770&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23770&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23770/head:pull/23770 PR: https://git.openjdk.org/jdk/pull/23770 From jsjolen at openjdk.org Fri Mar 7 10:14:12 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Mar 2025 10:14:12 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v35] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Fri, 7 Mar 2025 08:36:14 GMT, Afshin Zafari wrote: >> src/hotspot/share/nmt/virtualMemoryTracker.hpp line 49: >> >>> 47: // - uncommitted size of a MemTag should be <= of its committed size >>> 48: // - released size of a MemTag should be <= of its reserved size >>> 49: >> >> I don't believe that these are checked, right? So this can be deleted. > > As said at the end of line, when they are applied to the VirtualMemorySummary they will be checked. > Is it OK if I limit/wrap the comments to 80-columns text? Aha, that's when they're checked. Feel free to do so if you want, but I don't see any problem with going above 80 columns. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1984798716 From azafari at openjdk.org Fri Mar 7 10:26:17 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 7 Mar 2025 10:26:17 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag [v4] In-Reply-To: References: Message-ID: > With the `size` parameter there will be no need to traverse/go through the nodes between the base and end of the region. > Tests: > linux-x64-debug, gtest:NMT* and runtime/NMT* Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: new fix. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23770/files - new: https://git.openjdk.org/jdk/pull/23770/files/87f22f46..3850708c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23770&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23770&range=02-03 Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23770/head:pull/23770 PR: https://git.openjdk.org/jdk/pull/23770 From stuefe at openjdk.org Fri Mar 7 11:15:31 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Mar 2025 11:15:31 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use Message-ID: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. Tests: - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths - SAP reports all tests green (they had reported errors with the previous version) - Oracle Tests ongoing - GHAs green ------------- Commit messages: - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use - aix fix - test and aix exclusion - Fix windows when ArchiveRelocationMode=0 or 2 - original Changes: https://git.openjdk.org/jdk/pull/23912/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23912&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351040 Stats: 456 lines in 16 files changed: 361 ins; 29 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/23912.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23912/head:pull/23912 PR: https://git.openjdk.org/jdk/pull/23912 From jkern at openjdk.org Fri Mar 7 11:15:31 2025 From: jkern at openjdk.org (Joachim Kern) Date: Fri, 7 Mar 2025 11:15:31 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: On Wed, 5 Mar 2025 06:34:14 GMT, Thomas Stuefe wrote: > Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. > > JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). > Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). > > The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . > > The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. > > Tests: > - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths > - SAP reports all tests green (they had reported errors with the previous version) > - Oracle Tests ongoing > - GHAs green Hi Thomas, mprotect supports System V shared memory, but only if running in an environment where the MPROTECT_SHM=ON environmental variable is defined, which is not the case in the jdk. So we can fairly say System V shared memory cannot be mprotected by us. The documentation says: _The mprotect subroutine can only be used on shared memory regions backed with 4 KB or 64 KB pages;_ So we can mprotect 64K pages and mmap supports 64K pages beginning with AIX 7.3 TL1. With JDK-8334371 we favor the use of mmap 64K pages over System V shared memory if running on a system with AIX 7.3 TL1 or higher. But as long as we allow lower os versions the system V shared memory is still in place, and the mprotect restriction stays valid. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2700692743 From stuefe at openjdk.org Fri Mar 7 11:15:31 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Mar 2025 11:15:31 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: <2TgQM90DP5z8VbI62DnMs6Ef_5U0Hp55C4QAm3jsL-k=.06fbeb2f-cb6d-49a3-9279-57cc9bf50806@github.com> On Wed, 5 Mar 2025 06:34:14 GMT, Thomas Stuefe wrote: > Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. > > JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). > Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). > > The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . > > The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. > > Tests: > - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths > - SAP reports all tests green (they had reported errors with the previous version) > - Oracle Tests ongoing > - GHAs green Ping @iklam @ashu-mehra ------------- PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2706180836 From stuefe at openjdk.org Fri Mar 7 11:15:31 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Mar 2025 11:15:31 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: <2IEafHVhVF8hz_gsqaWS22ltt0z3pmkei36dO2qVnVE=.8709418c-122b-4387-bb8a-0d6ded90d753@github.com> On Wed, 5 Mar 2025 11:43:15 GMT, Joachim Kern wrote: > Hi Thomas, mprotect supports System V shared memory, but only if running in an environment where the MPROTECT_SHM=ON environmental variable is defined, which is not the case in the jdk. So we can fairly say System V shared memory cannot be mprotected by us. > > The documentation says: _The mprotect subroutine can only be used on shared memory regions backed with 4 KB or 64 KB pages;_ So we can mprotect 64K pages and mmap supports 64K pages beginning with AIX 7.3 TL1. With JDK-8334371 we favor the use of mmap 64K pages over System V shared memory if running on a system with AIX 7.3 TL1 or higher. But as long as we allow lower os versions the system V shared memory is still in place, and the mprotect restriction stays valid. Thank you, @JoKern65 . I remember this differently, but your knowledge is certainly more recent. That is even better, we can long term get rid of System V shm altogether, use mmap like normal modern Unices, and remove all that crud surrounding System V shared memory handling. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2706179797 From ihse at openjdk.org Fri Mar 7 11:29:58 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 7 Mar 2025 11:29:58 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:52:16 GMT, Aleksey Shipilev wrote: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) I agree with David here. Yes, implementing this multiple PRs is the correct approach (I think we all agree on this). However, it seems strange to mark just this single PR as implementing the JEP. Instead, that honor should fall on an umbrella JBS issue, which is dependent on this PR, but also the other planned updates. Before these are done, we can't really say that the JEP is implemented. In practical terms it does not mean much, but the bookkeeping seems better aligned with reality in that way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2706212136 From mli at openjdk.org Fri Mar 7 11:42:34 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 7 Mar 2025 11:42:34 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v5] In-Reply-To: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: > Hi, > Can you help to review this patch? > It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. > > ## Performance > > data > > Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 > Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 > Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 > Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 > Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 > Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 > Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 > Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.818 | 250.761 | 5191.12 | 64.598 | ns/op | 17.639 > Fl... Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - clean - renaming ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23844/files - new: https://git.openjdk.org/jdk/pull/23844/files/a6d36051..143869ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23844&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23844&range=03-04 Stats: 28 lines in 6 files changed: 6 ins; 16 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23844/head:pull/23844 PR: https://git.openjdk.org/jdk/pull/23844 From mli at openjdk.org Fri Mar 7 11:42:34 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 7 Mar 2025 11:42:34 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v4] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Fri, 7 Mar 2025 01:23:24 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> remove TestFloat16VectorConvChain test for riscv temporariely > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 339: > >> 337: single_precision, >> 338: double_precision >> 339: }; > > Seems better to move this enum to file `macroAssembler_riscv.hpp`? It's only used in the macro assember routines. OK. > src/hotspot/cpu/riscv/riscv.ad line 8254: > >> 8252: // half precision operations >> 8253: >> 8254: instruct reinterpretS2HF(fRegF dst, iRegINoSp src) > > Suggestion: `instruct reinterpretS2HF(fRegF dst, iRegI src)` OK. > src/hotspot/cpu/riscv/riscv.ad line 8257: > >> 8255: %{ >> 8256: match(Set dst (ReinterpretS2HF src)); >> 8257: effect(TEMP_DEF dst); > > I don't see why this `TEMP_DEF dst` effect is needed. Maybe we should remove it from all the newly-added instructs. OK. > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 474: > >> 472: case vmIntrinsics::_floatToFloat16: >> 473: case vmIntrinsics::_float16ToFloat: >> 474: if (!supports_float16_to_float()) { > > As the `supports_float16_to_float` function name doesn't reflect the `_floatToFloat16` case, I suggest to directly inline the code here: `return UseZfh || UseZfhmin;` Seems to me it's better to keep it in a method, as it's used in several places. I'll change it to supports_float16_float_conversion for naming improvement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1984917624 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1984917699 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1984917833 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1984917519 From mli at openjdk.org Fri Mar 7 11:45:57 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 7 Mar 2025 11:45:57 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v4] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Fri, 7 Mar 2025 01:59:06 GMT, Fei Yang wrote: > Maybe we should also update the `@requires` of the test at the same time? Currently, it says `| (os.arch == "riscv64" & vm.cpu.features ~= ".*zvfh.*")`. Maybe we change `zvfh` into `zfh`? No, as this test is for "vector conversion chain", only support of `zfh` should not trigger the test. BTW, scalar tests are in other test files. In the future, when we support vectorization the IR verification test should be enabled again, but it still depends on `zvfh` rather than `zfh`. Hope this answer your question? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23844#issuecomment-2706242048 From galder at openjdk.org Fri Mar 7 12:28:58 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 7 Mar 2025 12:28:58 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: <63F-0aHgMthexL0b2DFmkW8_QrJeo8OOlCaIyZApfpY=.4744070d-9d56-4031-8684-be14cf66d1e5@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <63F-0aHgMthexL0b2DFmkW8_QrJeo8OOlCaIyZApfpY=.4744070d-9d56-4031-8684-be14cf66d1e5@github.com> Message-ID: On Thu, 27 Feb 2025 06:54:30 GMT, Emanuel Peter wrote: > As for possible solutions. In all Regression 1-3 cases, it seems the issue is scalar cmove. So actually in all cases a possible solution is using branching code (i.e. `cmp+mov`). So to me, these are the follow-up RFE's: > > * Detect "extreme" probability scalar cmove, and replace them with branching code. This should take care of all regressions here. This one has high priority, as it fixes the regression caused by this patch here. But it would also help to improve performance for the `Integer.min/max` cases, which have the same issue. I've created [JDK-8351409](https://bugs.openjdk.org/browse/JDK-8351409) to address this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2706324225 From ayang at openjdk.org Fri Mar 7 13:16:59 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 7 Mar 2025 13:16:59 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v14] In-Reply-To: References: Message-ID: <5w6qUwzDQadxseocRl6rRF0AllyeukWTpYl2XjAfiTE=.fb62a50e-e308-4d08-8057-67e70e13ccbb@github.com> On Thu, 6 Mar 2025 16:26:31 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * iwalulya review > * renaming > * fix some includes, forward declaration src/hotspot/share/gc/g1/g1CardTable.hpp line 76: > 74: g1_card_already_scanned = 0x1, > 75: g1_to_cset_card = 0x2, > 76: g1_from_remset_card = 0x4 Could you outline the motivation for this more precise info? Is it for optimization or essentially for correctness? src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 54: > 52: assert(refinement_r == card_r, "not same region source %u (%zu) dest %u (%zu) ", refinement_r->hrm_index(), refinement_i, card_r->hrm_index(), card_i); > 53: assert(refinement_i == card_i, "indexes are not same %zu %zu", refinement_i, card_i); > 54: #endif I feel this assert logic can be extracted to a method, sth like `verify_card_pair`. src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 64: > 62: report_inactive("Paused"); > 63: sts_join.yield(); > 64: // Reset after yield rather than accumulating across yields, else a The comment seems obsolete after the removal of stats. src/hotspot/share/gc/g1/g1OopClosures.inline.hpp line 158: > 156: if (_has_ref_to_cset) { > 157: return; > 158: } Is it really necessary to write `false` to `_has_ref_to_cset`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1985041202 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1983846649 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1983842440 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1983857348 From epeter at openjdk.org Fri Mar 7 13:19:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 13:19:59 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <63F-0aHgMthexL0b2DFmkW8_QrJeo8OOlCaIyZApfpY=.4744070d-9d56-4031-8684-be14cf66d1e5@github.com> Message-ID: On Fri, 7 Mar 2025 12:25:51 GMT, Galder Zamarre?o wrote: >> @galderz Thanks for the summary of regressions! Yes, there are plenty of speedups, I assume primarily because of `Long.min/max` vectorization, but possibly also because the operation can now "float" out of a loop for example. >> >> All your Regressions 1-3 are cases with "extreme" probabilitiy (close to 100% / 0%), you listed none else. That matches with my intuition, that branching code is usually better than cmove in extreme probability cases. >> >> As for possible solutions. In all Regression 1-3 cases, it seems the issue is scalar cmove. So actually in all cases a possible solution is using branching code (i.e. `cmp+mov`). So to me, these are the follow-up RFE's: >> - Detect "extreme" probability scalar cmove, and replace them with branching code. This should take care of all regressions here. This one has high priority, as it fixes the regression caused by this patch here. But it would also help to improve performance for the `Integer.min/max` cases, which have the same issue. >> - Additional performance improvement: make SuperWord recognize more cases as profitble (see Regression 1). Optional. >> - Additional performance improvement: extend backend capabilities for vectorization (see Regression 2 + 3). Optional. >> >> Does that make sense, or am I missing something? > >> As for possible solutions. In all Regression 1-3 cases, it seems the issue is scalar cmove. So actually in all cases a possible solution is using branching code (i.e. `cmp+mov`). So to me, these are the follow-up RFE's: >> >> * Detect "extreme" probability scalar cmove, and replace them with branching code. This should take care of all regressions here. This one has high priority, as it fixes the regression caused by this patch here. But it would also help to improve performance for the `Integer.min/max` cases, which have the same issue. > > I've created [JDK-8351409](https://bugs.openjdk.org/browse/JDK-8351409) to address this. @galderz Excellent. Testing looks all good on our side. Yes I think what you saw was unrelated. @rwestrel Could give this a last quick scan and then I think you can integrate :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2706434983 From egahlin at openjdk.org Fri Mar 7 13:30:54 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 7 Mar 2025 13:30:54 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v3] In-Reply-To: References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: On Thu, 6 Mar 2025 12:24:10 GMT, Aleksey Shipilev wrote: >> We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. >> >> This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Touch up descriptions > - Fix test in release builds > - Merge branch 'master' into JDK-8351142-jfr-deflate-event > - Merge branch 'master' into JDK-8351142-jfr-deflate-event > - Test updates > - Rework statistics event to be actually statistics > - Filter JFR HiddenWait consistently > - Event metadata touchups > - Separate statistics event as well > - Fix Looks good overall, but I'm not sure we should add maxCount. I'm hesitant because the peak value can easily be calculated, which we already do for other events (CPULoad, NetworkUtilization, NativeMemoryUsage etc) in "jfr view". ------------- PR Comment: https://git.openjdk.org/jdk/pull/23900#issuecomment-2706457566 From ihse at openjdk.org Fri Mar 7 15:10:57 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 7 Mar 2025 15:10:57 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 07:21:43 GMT, David Holmes wrote: >> It's no that important, no. I'm not sure if previous deprecated ports were handles exactly like this. >> >> And you can always do like `git log | grep -i "remove .* port"` to find the change it was removed in, and look what it did... > > I think leaving a comment describing how to deprecate a port is useful. To look it up in history you have to realise there is something to look up. > > "They who are not reminded of the past will invent a new way to do it in the future." The `--enable-deprecated-ports` is still there. All that is removed is an if statement and a print line. I know the make syntax can seem intimidating, but just ask me or any other build team member if you need help to recreate such a thing. It is not like it is a complicated algorithm that can be written in many ways. This is just make's equivalant of: if (some_condition) { println("whatever"); } To me this is just utter nonsense to keep that commented out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23906#discussion_r1985229429 From ihse at openjdk.org Fri Mar 7 15:10:59 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 7 Mar 2025 15:10:59 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:52:16 GMT, Aleksey Shipilev wrote: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) make/autoconf/platform.m4 line 669: > 667: AC_ARG_ENABLE(deprecated-ports, [AS_HELP_STRING([--enable-deprecated-ports@<:@=yes/no@:>@], > 668: [Suppress the error when configuring for a deprecated port @<:@no@:>@])]) > 669: # There are no deprecated ports. This option is left to be consistent with future deprecations. Also, to be clear, we need to keep the option to not break people's scripts. The alternative would be to deprecate the `--enable-deprecated-ports` arguments, and then remove it in a future release, but I think it is reasonable to keep it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23906#discussion_r1985232539 From duke at openjdk.org Fri Mar 7 15:15:01 2025 From: duke at openjdk.org (duke) Date: Fri, 7 Mar 2025 15:15:01 GMT Subject: Withdrawn: 8345289: RISC-V: enable some extensions with hwprobe In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 09:55:22 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > Currently, some extensions are not enable automatically with hwprobe, this is to enable them with hwprobe result. > > Thanks! > > Tests running so far so good. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22474 From jbechberger at openjdk.org Fri Mar 7 15:24:26 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Fri, 7 Mar 2025 15:24:26 GMT Subject: RFR: 8342818: Implement CPU Time Profiling for JFR [v38] In-Reply-To: References: Message-ID: > This is the code for the [JEP draft: CPU Time based profiling for JFR]. Johannes Bechberger has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 179 commits: - Only renew buffer if needed - Tiny improvements - Fix unlocking - Fix deadlock - Increase queue size - Improve lock placement - Don't bitpack for now - Bit packing - Fix compile - Fix for non-linux build - ... and 169 more: https://git.openjdk.org/jdk/compare/efc597bf...53a2560d ------------- Changes: https://git.openjdk.org/jdk/pull/20752/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=37 Stats: 2943 lines in 63 files changed: 2674 ins; 198 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/20752.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20752/head:pull/20752 PR: https://git.openjdk.org/jdk/pull/20752 From azafari at openjdk.org Fri Mar 7 16:06:32 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 7 Mar 2025 16:06:32 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag [v5] In-Reply-To: References: Message-ID: <0SlK7ixxGv5N7-LQnC7SwgpcK4Oz_9_H24qnrGPrTpc=.9bfd6434-6a48-4563-9dd6-66cff70dafe7@github.com> > With the `size` parameter there will be no need to traverse/go through the nodes between the base and end of the region. > Tests: > linux-x64-debug, gtest:NMT* and runtime/NMT* Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge remote-tracking branch 'origin/master' into _8350566_size_par_set_tag - new fix. - fixed build problem. - ReservedSpace is accepted as param. - applied also to VMT. - 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag ------------- Changes: https://git.openjdk.org/jdk/pull/23770/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23770&range=04 Stats: 27 lines in 14 files changed: 6 ins; 1 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/23770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23770/head:pull/23770 PR: https://git.openjdk.org/jdk/pull/23770 From sroy at openjdk.org Fri Mar 7 17:02:01 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 7 Mar 2025 17:02:01 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: <7rVbCbWDqrib9Jyj7_hkD-r9rkaAOIXuwOGAqImrxoY=.a55e9572-b4e6-4cc2-aa0e-c23deb9961ce@github.com> On Mon, 3 Mar 2025 10:47:59 GMT, Martin Doerr wrote: >> @TheRealMDoerr can you explain how it can be equivalent to these 4 instructions ? >> we are extracting the different parts of midProduct here ,64 bits each, for the cross product. >> I,e Xl * Hh +Hl*Xh , so the below 2 are required >> masm->vsldoi(vTmp8, vMidProduct, vZero, 8); >> masm->vsldoi(vTmp9, vZero, vMidProduct, 8); >> >> >> >> >> ? > > Your version extracts 2 8 Byte parts and feeds them into separate xor instructions. My proposal performs both 8 Byte xor operations with one vxor instruction by selecting the input bits accordingly. It furthermore avoids swapping halves forth and back (I swap the halves of vReducedLow instead). > Have you tried? @TheRealMDoerr Yes. The tests do not pass with this. Trying to find a scope to reduce instructions. masm->vsldoi(vLowProduct, vLowProduct, vLowProduct, 8); // Swap masm->vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant masm->vsldoi(vCombinedResult, vLowProduct, vLowProduct, 8); // Swap can be brought down to 2 instructions. Still looking for scope to reduce. Let me know your inputs ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1985402217 From ryan at iernst.net Fri Mar 7 17:28:52 2025 From: ryan at iernst.net (Ryan Ernst) Date: Fri, 7 Mar 2025 09:28:52 -0800 Subject: Verification in agent transformers Message-ID: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> Hi folks, In Elasticsearch we use an agent to instrument sensitive methods (ie a Security Manager replacement). Recently we found a VerifyError during instrumentation. The specific problem was an incompatible argument type to one of the methods we call from instrumented classes. The reason for this mail is to understand the context of why we only got the VerifyError in certain circumstances. The VerifyError tripped only on Java 24, and only when we call retransformClasses. When the transformer runs outside of retransformClasses, there is no VerifyError, yet the incompatible type existed (but it was unused, so did not trip a runtime problem, it was just a bad type sitting on the stack). What's the reason for only running verification when retransforming class, not on all transforms? I should note that this is for a JDK class, which as I understand are not verified upon loading normally? Thanks! Ryan From coleen.phillimore at oracle.com Fri Mar 7 17:43:48 2025 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 7 Mar 2025 12:43:48 -0500 Subject: Verification in agent transformers In-Reply-To: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> References: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> Message-ID: <4d1ccef8-c837-482b-abb3-72f28593d08a@oracle.com> On 3/7/25 12:28 PM, Ryan Ernst wrote: > Hi folks, > > In Elasticsearch we use an agent to instrument sensitive methods (ie a Security Manager replacement). Recently we found a VerifyError during instrumentation. The specific problem was an incompatible argument type to one of the methods we call from instrumented classes. > > The reason for this mail is to understand the context of why we only got the VerifyError in certain circumstances. The VerifyError tripped only on Java 24, and only when we call retransformClasses. When the transformer runs outside of retransformClasses, there is no VerifyError, yet the incompatible type existed (but it was unused, so did not trip a runtime problem, it was just a bad type sitting on the stack). > > What's the reason for only running verification when retransforming class, not on all transforms? I should note that this is for a JDK class, which as I understand are not verified upon loading normally? Hi, We don't verify JDK classes because we provide and trust the implementation of these classes, but when you retransform these classes, we do not control what the redefinition will provide so verify them to maintain the security of the running application. This is a recent change in JDK 24, because the code intended to do this all along but there was a bug where it didn't. You can run -Xlog:verification to see the details of the VerifyError.? If it is bytecodes in the JDK and not ones provided by you, please report this to us so we can fix it. Thank you, Coleen > > Thanks! > Ryan From ryan at iernst.net Fri Mar 7 18:00:34 2025 From: ryan at iernst.net (Ryan Ernst) Date: Fri, 7 Mar 2025 10:00:34 -0800 Subject: Verification in agent transformers In-Reply-To: <4d1ccef8-c837-482b-abb3-72f28593d08a@oracle.com> References: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> <4d1ccef8-c837-482b-abb3-72f28593d08a@oracle.com> Message-ID: Hi Coleen, Thanks for the reply. The error was not in JDK code, it was in our transformation of the JDK code. We inserted a bad function call, passing an incompatible argument. Running verification on retransformClasses makes complete sense. But my question was about why the result of transformers are not verified when _not_ triggered by retransformClasses. That is, the same transformer that had the bug existed. When it is run via retransformClasses, it causes a VerifyError. But if it is run later in the program, the no verification error occurs, yet the transformer still produced broken bytecode. Additionally, we noticed that the VerifyError had no message. > [2025-03-06T20:03:17,159][WARN ][stderr ] [instance-0000000010] Caused by: java.lang.VerifyError > [2025-03-06T20:03:17,159][WARN ][stderr ] [instance-0000000010] at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method) > [2025-03-06T20:03:17,160][WARN ][stderr ] [instance-0000000010] at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:221) However, when we captured the broken class bytes, if we run with `javap -verify` we get a clear error: > Bad type on operand stack in sun/net/www/protocol/https/AbstractDelegateHttpsURLConnection::connect() @13 (javax/net/ssl/HttpsURLConnection is not assignable from sun/net/www/protocol/https/AbstractDelegateHttpsURLConnection) Is there some difference in how verify is run between javap and at runtime that would account for an empty message? Thanks Ryan > On Mar 7, 2025, at 9:43?AM, coleen.phillimore at oracle.com wrote: > > > > On 3/7/25 12:28 PM, Ryan Ernst wrote: >> Hi folks, >> >> In Elasticsearch we use an agent to instrument sensitive methods (ie a Security Manager replacement). Recently we found a VerifyError during instrumentation. The specific problem was an incompatible argument type to one of the methods we call from instrumented classes. >> >> The reason for this mail is to understand the context of why we only got the VerifyError in certain circumstances. The VerifyError tripped only on Java 24, and only when we call retransformClasses. When the transformer runs outside of retransformClasses, there is no VerifyError, yet the incompatible type existed (but it was unused, so did not trip a runtime problem, it was just a bad type sitting on the stack). >> >> What's the reason for only running verification when retransforming class, not on all transforms? I should note that this is for a JDK class, which as I understand are not verified upon loading normally? > > Hi, > > We don't verify JDK classes because we provide and trust the implementation of these classes, but when you retransform these classes, we do not control what the redefinition will provide so verify them to maintain the security of the running application. This is a recent change in JDK 24, because the code intended to do this all along but there was a bug where it didn't. > > You can run -Xlog:verification to see the details of the VerifyError. If it is bytecodes in the JDK and not ones provided by you, please report this to us so we can fix it. > > Thank you, > Coleen > >> >> Thanks! >> Ryan > From alan.bateman at oracle.com Fri Mar 7 18:42:57 2025 From: alan.bateman at oracle.com (Alan Bateman) Date: Fri, 7 Mar 2025 18:42:57 +0000 Subject: Verification in agent transformers In-Reply-To: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> References: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> Message-ID: <87de9ca1-bf71-4464-8386-4549395ff99e@oracle.com> On 07/03/2025 17:28, Ryan Ernst wrote: > When the transformer runs outside of retransformClasses, there is no VerifyError, yet the incompatible type existed (but it was unused, so did not trip a runtime problem, it was just a bad type sitting on the stack). > What does "outside of retransformClasses" mean? Is this static instrumentation where classes in modules mapped to the boot loader are instrumented and the jimage re-created with the modified classes, or is this load time instrumentation? -Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: From azafari at openjdk.org Fri Mar 7 18:43:54 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 7 Mar 2025 18:43:54 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag [v2] In-Reply-To: References: Message-ID: <6-tmSINEwkIMphxPbnP92QmD_-i3Ui7pU9aLpeQ_PmY=.1760c755-2e70-4152-a273-1ad036c46e2e@github.com> On Thu, 6 Mar 2025 15:24:49 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> ReservedSpace is accepted as param. > > Need to fix the build errors: > > /home/runner/work/jdk/jdk/src/hotspot/share/nmt/memTracker.hpp:224:31: error: invalid use of incomplete type ?const class ReservedSpace? > 224 | record_virtual_memory_tag(rs.base(), rs.size(), mem_tag); > | ^~ > In file included from /home/runner/work/jdk/jdk/src/hotspot/share/memory/allocation.cpp:28: > /home/runner/work/jdk/jdk/src/hotspot/share/memory/metaspace.hpp:38:7: note: forward declaration of ?class ReservedSpace? > 38 | class ReservedSpace; > | ^~~~~~~~~~~~~ > In file included from /home/runner/work/jdk/jdk/src/hotspot/share/memory/allocation.cpp:30: > /home/runner/work/jdk/jdk/src/hotspot/share/nmt/memTracker.hpp:224:42: error: invalid use of incomplete type ?const class ReservedSpace? > 224 | record_virtual_memory_tag(rs.base(), rs.size(), mem_tag); > | ^~ > In file included from /home/runner/work/jdk/jdk/src/hotspot/share/memory/allocation.cpp:28: > /home/runner/work/jdk/jdk/src/hotspot/share/memory/metaspace.hpp:38:7: note: forward declaration of ?class ReservedSpace? > ... (rest of output omitted) Thank you @gerard-ziemski and @jdksjolen for your reviews. Build failure and merge problems fixed. GHA tests failures are due timeout. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23770#issuecomment-2707162561 From ryan at iernst.net Fri Mar 7 18:48:43 2025 From: ryan at iernst.net (Ryan Ernst) Date: Fri, 7 Mar 2025 10:48:43 -0800 Subject: Verification in agent transformers In-Reply-To: <87de9ca1-bf71-4464-8386-4549395ff99e@oracle.com> References: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> <87de9ca1-bf71-4464-8386-4549395ff99e@oracle.com> Message-ID: <558ED898-3C7E-495D-8D27-EC0E78681B72@iernst.net> This is load time instrumentation. Once we?ve registered our transformer, we look at what classes have already been loaded, and force retransformation of those. The error only occurred when the class in question was loaded before our (dynamic) agent ran. > On Mar 7, 2025, at 10:42?AM, Alan Bateman wrote: > > > > On 07/03/2025 17:28, Ryan Ernst wrote: >> When the transformer runs outside of retransformClasses, there is no VerifyError, yet the incompatible type existed (but it was unused, so did not trip a runtime problem, it was just a bad type sitting on the stack). >> > > What does "outside of retransformClasses" mean? Is this static instrumentation where classes in modules mapped to the boot loader are instrumented and the jimage re-created with the modified classes, or is this load time instrumentation? > > -Alan From coleen.phillimore at oracle.com Fri Mar 7 19:36:17 2025 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 7 Mar 2025 14:36:17 -0500 Subject: [External] : Re: Verification in agent transformers In-Reply-To: References: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> <4d1ccef8-c837-482b-abb3-72f28593d08a@oracle.com> Message-ID: On 3/7/25 1:00 PM, Ryan Ernst wrote: > Hi Coleen, > > Thanks for the reply. The error was not in JDK code, it was in our transformation of the JDK code. We inserted a bad function call, passing an incompatible argument. > > Running verification on retransformClasses makes complete sense. But my question was about why the result of transformers are not verified when _not_ triggered by retransformClasses. That is, the same transformer that had the bug existed. When it is run via retransformClasses, it causes a VerifyError. But if it is run later in the program, the no verification error occurs, yet the transformer still produced broken bytecode. I don't know what this last sentence means.? If the class is loaded and then linked, and is not on the bootstrap class loader, then it should be verified.? (?) > > Additionally, we noticed that the VerifyError had no message. Yes, this is an unfortunate feature of errors during retransform/redefinition.? They lose their messages since JVMTI returns only JVMTI_ERROR_FAILS_VERIFICATION to the agent. -Xlog:verification is the way to see what the specific error is. > >> [2025-03-06T20:03:17,159][WARN ][stderr ] [instance-0000000010] Caused by: java.lang.VerifyError >> [2025-03-06T20:03:17,159][WARN ][stderr ] [instance-0000000010] at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method) >> [2025-03-06T20:03:17,160][WARN ][stderr ] [instance-0000000010] at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:221) > However, when we captured the broken class bytes, if we run with `javap -verify` we get a clear error: > >> Bad type on operand stack in sun/net/www/protocol/https/AbstractDelegateHttpsURLConnection::connect() @13 (javax/net/ssl/HttpsURLConnection is not assignable from sun/net/www/protocol/https/AbstractDelegateHttpsURLConnection) > Is there some difference in how verify is run between javap and at runtime that would account for an empty message? It looks like javap can run the verifier directly and report the error. Coleen > > Thanks > Ryan > >> On Mar 7, 2025, at 9:43?AM, coleen.phillimore at oracle.com wrote: >> >> >> >> On 3/7/25 12:28 PM, Ryan Ernst wrote: >>> Hi folks, >>> >>> In Elasticsearch we use an agent to instrument sensitive methods (ie a Security Manager replacement). Recently we found a VerifyError during instrumentation. The specific problem was an incompatible argument type to one of the methods we call from instrumented classes. >>> >>> The reason for this mail is to understand the context of why we only got the VerifyError in certain circumstances. The VerifyError tripped only on Java 24, and only when we call retransformClasses. When the transformer runs outside of retransformClasses, there is no VerifyError, yet the incompatible type existed (but it was unused, so did not trip a runtime problem, it was just a bad type sitting on the stack). >>> >>> What's the reason for only running verification when retransforming class, not on all transforms? I should note that this is for a JDK class, which as I understand are not verified upon loading normally? >> Hi, >> >> We don't verify JDK classes because we provide and trust the implementation of these classes, but when you retransform these classes, we do not control what the redefinition will provide so verify them to maintain the security of the running application. This is a recent change in JDK 24, because the code intended to do this all along but there was a bug where it didn't. >> >> You can run -Xlog:verification to see the details of the VerifyError. If it is bytecodes in the JDK and not ones provided by you, please report this to us so we can fix it. >> >> Thank you, >> Coleen >> >>> Thanks! >>> Ryan From azafari at openjdk.org Fri Mar 7 20:22:53 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 7 Mar 2025 20:22:53 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag [v2] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 15:24:49 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> ReservedSpace is accepted as param. > > Need to fix the build errors: > > /home/runner/work/jdk/jdk/src/hotspot/share/nmt/memTracker.hpp:224:31: error: invalid use of incomplete type ?const class ReservedSpace? > 224 | record_virtual_memory_tag(rs.base(), rs.size(), mem_tag); > | ^~ > In file included from /home/runner/work/jdk/jdk/src/hotspot/share/memory/allocation.cpp:28: > /home/runner/work/jdk/jdk/src/hotspot/share/memory/metaspace.hpp:38:7: note: forward declaration of ?class ReservedSpace? > 38 | class ReservedSpace; > | ^~~~~~~~~~~~~ > In file included from /home/runner/work/jdk/jdk/src/hotspot/share/memory/allocation.cpp:30: > /home/runner/work/jdk/jdk/src/hotspot/share/nmt/memTracker.hpp:224:42: error: invalid use of incomplete type ?const class ReservedSpace? > 224 | record_virtual_memory_tag(rs.base(), rs.size(), mem_tag); > | ^~ > In file included from /home/runner/work/jdk/jdk/src/hotspot/share/memory/allocation.cpp:28: > /home/runner/work/jdk/jdk/src/hotspot/share/memory/metaspace.hpp:38:7: note: forward declaration of ?class ReservedSpace? > ... (rest of output omitted) @gerard-ziemski and @jdksjolen, new reviews are needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23770#issuecomment-2707346222 From eastig at amazon.co.uk Fri Mar 7 23:47:53 2025 From: eastig at amazon.co.uk (Astigeevich, Evgeny) Date: Fri, 7 Mar 2025 23:47:53 +0000 Subject: [External] : RFD: Grouping hot code in CodeCache In-Reply-To: <2623f909-bb91-4450-bc05-d9181ba3abcb@oracle.com> References: <1B0C3138-761B-4DB0-8A98-977C6FC40178@amazon.co.uk> <2623f909-bb91-4450-bc05-d9181ba3abcb@oracle.com> Message-ID: Hi Vladimir, Thank you for the feedback. > My concern is that it will complicate VM existing code for not > significant benefits in real production environment. I think it won't complicate the existing code: - Adding a code heap is ~50 lines of code, mostly in CodeCache::initialize_heaps. - Relocating nmethods, according to PR[1], is ~300 lines of code. - A grouping thread is simple and isolated. It will go through Java threads checking their last frame(s) and recording seen nmethods. It should have less code than the sweeper which was ~700 lines. I think it's better to wait for PoC to see the complexity. > What improvements your experiments in real production runs shows? And > which JDK version you used for that? In production we are using internally 17 (static lists of methods) and 21 (dynamic lists of methods). Improvements are in a range of 5% - 15%. They depend on how big CPU load is: the more CPU load the bigger improvement. > As you know most of nmethod's metadata is moved from CodeCache. > ... > After that the code will be a lot more compact in CodeCache. Code sparsity > should be less issue then. Yes, removing non-code from nmethod will improve code density. This means in a code region we will have more code vs non-code. CPU instruction caches will like this. As I wrote in a comment to benchmark PR [2], Neoverse operates in code regions. For Neoverse it's more important to have as less code regions with active nmethods as possible. We are aware of cases when CodeCache usage is between 512M - 1G. The mentioned changes won't help in those cases. If I remember, no public benchmarks have demonstrated improvements from non-code moved away from nmethod. Since the removal of the Sweeper, GC is in charge of cleaning CodeCache. We've seen cases when GC was triggered often because of allocation pressure on CodeCache. For such cases, a recommended workaround is to increase the size of CodeCache from default 240M up to 512M. In such cases actively used nmethods will more likely be sparse. > It would be nice if you redo your production experiments after that. Due to the complexity of customer's application we cannot run it on OpenJDKTip. It has thousand dependencies. We will need to move them on OpenJDKTip. I think it would be difficult to backport the mentioned changes to 21 > I understand that we can still have sparsity due to "warm" nmethods and > C1 compiled code mixed with "hot" C2 nmethods. Customers having issues with big CodeCache on Graviton usually turn off tiered compilation to reduce far jumps/calls. BTW, this is another argument for identifying active nmethods and grouping them together: it should reduce/eliminate far jumps/calls. With small CodeCache mix of C1 and C2 nmethods is not an issue. > Can we simply use a separate CodeCache's segment for all > C2 "hot" (we can specify frequency flag to determine what "hot" means) > methods regardless when they are compiled. I did not get the idea. We already have the non-profiled segment where C2 code is put. Do you mean that at the compilation time some methods are put in the regular non-profile segment and some in the specific non-profile segment? What we've seen that methods profiles keep changing. There are the following cases: 1. C2 methods used most of the time: their profile can stay the same or can get hotter. 2. C2 methods used periodically: actively used, not used, actively used and so on 3. C2 methods used actively during some time and never used after Currently GC identifies cases #3 and some cases #2, aka cold code. The percentage of methods case #1 is ~10% - 20%. If we have 100M of C2 code, only 10M - 20M will be actively used. If we get unlucky, those 10M-20M could be spread across CodeCache and cause CPU stalls. How can we identify those 10%-20% of methods at compilation time? BTW, I think the separate hot code heap might simplify flushing cold code. Everything not in the hot code heap can automatically assumed cold. Thanks, Evgeny [1]: https://github.com/openjdk/jdk/pull/23573 [2]: https://github.com/openjdk/jdk/pull/23831#issuecomment-2705085399 ?On 06/03/2025, 22:41, "Vladimir Kozlov" > wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Evgeny, My concern is that it will complicate VM existing code for not significant benefits in real production environment. What improvements your experiments in real production runs shows? And which JDK version you used for that? As you know most of nmethod's metadata is moved from CodeCache. And Boris Ulasevich will move the final part (relocation info) soon. After that the code will be a lot more compact in CodeCache. Code sparsity should be less issue then. It would be nice if you redo your production experiments after that. I understand that we can still have sparsity due to "warm" nmethods and C1 compiled code mixed with "hot" C2 nmethods. I think compilation policy has heuristic to detect "warm" method (time intervals between invocations). Can we simply use a separate CodeCache's segment for all C2 "hot" (we can specify frequency flag to determine what "hot" means) methods regardless when they are compiled. Then you don't need to create list or do anything special for them. Most likely we will waste more space in CodeCache but it could be conditional under flag which you already proposed in separate segment RFE. Thanks, Vladimir K On 3/5/25 10:41 AM, Astigeevich, Evgeny wrote: > Hi Vladimir, > > This is JDK-8326205: Implement grouping hot nmethods in CodeCache. > > As I managed to synthesize a benchmark (https://github.com/openjdk/jdk/ > pull/23831 > pull/23831__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- > baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- > imnfmmpfw$>) to demonstrate performance impact of sparse code, I?d like > to discuss a possible solution of the sparse code. > > High level, a solution is: > > * Detect hot code. > * Group hot code. > * Maintain grouped code. > > Downstream we tried two approaches: > > * *Static lists of methods (compile command):* Identify frequently > used (hot) methods using test runs and provide static method lists > to JVM in production. When JVM compiles a Java method and the method > is on the list, JVM puts the code into to a designated code heap > (HotCodeHeap). > * *Dynamic lists of methods (compiler directives):* Profile an > application in production and dynamically relocate identified hot > methods to HotCodeHeap. Relocation was implemented with recompilation. > > The main advantage of static lists is zero profiling overhead in > production. We do all profiling and analysis in test runs. Its problems are: > > * *Training Run Accuracy*: We need training runs to have execution > paths closely mimicking production environments. Otherwise we put > wrong methods into HotCodeHeap. > * *Method List Maintenance:* We need to rerun training to regenerate > lists when application code changes. Training runs are expensive and > time-consuming. They require long runs to guarantee we see all major > execution paths. Updating lists in production can be as complex as > application deployment > * *Method Placement Limitations:* Methods marked for HotCodeHeap are > permanently placed into HotCodeHeap. No mechanism to remove methods > that become less frequently used. > > We addressed these problems with dynamic lists of methods. We > implemented a Java agent that runs within the same JVM to dynamically > detect and manage hot Java methods without prior method identification. > The agent detects hot methods using JFR. The agent manages hot Java > methods in HotCodeHeap with compiler directives. A new compiler > directive marks methods with dynamic states ("hot" or "cold"). Methods > marked by the ?hot? state are recompiled and placed in HotCodeHeap. > Methods marked by the ?cold? state are eventually removed from HotCodeHeap. > > Problems of this approach are: > > * It requires specific, complex modifications to compiler directive > support: recompilation of Java methods affected by compiler > directives changes. This functionality is unique to Java agent > implementation and has limited potential for broader use. > * The agent cannot guarantee Java methods are moved to/removed from > the HotCodeHeap because updates of compiler directives can fail. > * The agent knows nothing about compiled code, e.g. whether it?s C1 or > C2 compiled, code size, profile. This data can useful for deciding > to move or not to move to HotCodeHeap. > * Recompilations, especially C2, are expensive. Having many of them > can cause performance issues. Also recompiled code might differ from > the code we have detected as ?hot?. > > Running these two approaches in production we learned: > > * We detect 95% of actively used methods withing the first 30 minutes > of an application run. This is with JFR profiling configured: 90 > seconds session duration, sampling each 11 ms, 8 minutes between > profiling sessions. We can find actively used methods faster if we > reduce a pause between profiling sessions and sampling period. > However it will increase the profiling overhead and affect > application performance. With the current configuration, the > profiling overhead is between 1% - 2%. > * A set of actively used methods gets into the steady state (no new > methods added to, no methods removed from) within the first 60 minutes. > * Static lists, when created from runs close to production, have 80% - > 90% methods always in use. This does not change over time. > * Predicting the size of HotCodeHeap is difficult, especially with > dynamic lists. > > We want to have grouping of hot method functionality as a part Hotspot > JVM. We will group only C2 compiled methods. We can group JVMCI compiled > methods, e.g. Graal, if needed. We need profiling precise enough to > detect major Java methods. Low overhead is more important than precision. > > We think we can have a solution which does not require a lot of code: > > * Detect hot code: we can an implementation based on the Sweeper: > https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/ > runtime/sweeper.hpp > openjdk/jdk17u/blob/master/src/hotspot/share/runtime/ > sweeper.hpp__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- > baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- > imVr_axpo$>. We will use the handshakes mechanism, what the Sweeper > used, to detect nmethods on the top of thread stacks. > * Group hot code: we have a draft PR https://github.com/openjdk/jdk/ > pull/23573 > jdk/pull/23573__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- > baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- > imcL9xtiE$>. It implements relocation of nmethods within CodeCache. > * Maintain grouped code: we will add an additional code heap where hot > nmethods will be relocated to. > > What do you think about this approach? Are there other possible solutions? > > Thanks, > > Evgeny A. > > > > > Amazon Development Centre (London) Ltd.Registered in England and Wales > with registration number 04543232 with its registered office at 1 > Principal Place, Worship Street, London EC2A 2FA, United Kingdom. > > Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. From ryan at iernst.net Fri Mar 7 23:50:15 2025 From: ryan at iernst.net (Ryan Ernst) Date: Fri, 7 Mar 2025 15:50:15 -0800 Subject: [External] : Re: Verification in agent transformers In-Reply-To: References: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> <4d1ccef8-c837-482b-abb3-72f28593d08a@oracle.com> Message-ID: <37F06AA7-E883-4BCF-8E0E-6B2CF1A81FBD@iernst.net> > If the class is loaded and then linked, and is not on the bootstrap class loader, then it should be verified. I agree, but that?s not what we observed. We only see the VerifyError if retransformClasses is called. If instead the class is loaded after the transformer has been registered, no VerifyError occurs, even though the transformer produced technically broken bytecode (I say ?technically? because although the parameter type was wrong, we never used the parameter in this method, so it didn?t actually cause any problems at runtime, it was just a bad reference on the stack that was ignored). > On Mar 7, 2025, at 11:36?AM, coleen.phillimore at oracle.com wrote: > > > > On 3/7/25 1:00 PM, Ryan Ernst wrote: >> Hi Coleen, >> >> Thanks for the reply. The error was not in JDK code, it was in our transformation of the JDK code. We inserted a bad function call, passing an incompatible argument. >> >> Running verification on retransformClasses makes complete sense. But my question was about why the result of transformers are not verified when _not_ triggered by retransformClasses. That is, the same transformer that had the bug existed. When it is run via retransformClasses, it causes a VerifyError. But if it is run later in the program, the no verification error occurs, yet the transformer still produced broken bytecode. > > I don't know what this last sentence means. If the class is loaded and then linked, and is not on the bootstrap class loader, then it should be verified. (?) >> >> Additionally, we noticed that the VerifyError had no message. > > Yes, this is an unfortunate feature of errors during retransform/redefinition. They lose their messages since JVMTI returns only JVMTI_ERROR_FAILS_VERIFICATION to the agent. > > -Xlog:verification is the way to see what the specific error is. > >> >>> [2025-03-06T20:03:17,159][WARN ][stderr ] [instance-0000000010] Caused by: java.lang.VerifyError >>> [2025-03-06T20:03:17,159][WARN ][stderr ] [instance-0000000010] at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method) >>> [2025-03-06T20:03:17,160][WARN ][stderr ] [instance-0000000010] at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:221) >> However, when we captured the broken class bytes, if we run with `javap -verify` we get a clear error: >> >>> Bad type on operand stack in sun/net/www/protocol/https/AbstractDelegateHttpsURLConnection::connect() @13 (javax/net/ssl/HttpsURLConnection is not assignable from sun/net/www/protocol/https/AbstractDelegateHttpsURLConnection) >> Is there some difference in how verify is run between javap and at runtime that would account for an empty message? > > It looks like javap can run the verifier directly and report the error. > > Coleen > >> >> Thanks >> Ryan >> >>> On Mar 7, 2025, at 9:43?AM, coleen.phillimore at oracle.com wrote: >>> >>> >>> >>> On 3/7/25 12:28 PM, Ryan Ernst wrote: >>>> Hi folks, >>>> >>>> In Elasticsearch we use an agent to instrument sensitive methods (ie a Security Manager replacement). Recently we found a VerifyError during instrumentation. The specific problem was an incompatible argument type to one of the methods we call from instrumented classes. >>>> >>>> The reason for this mail is to understand the context of why we only got the VerifyError in certain circumstances. The VerifyError tripped only on Java 24, and only when we call retransformClasses. When the transformer runs outside of retransformClasses, there is no VerifyError, yet the incompatible type existed (but it was unused, so did not trip a runtime problem, it was just a bad type sitting on the stack). >>>> >>>> What's the reason for only running verification when retransforming class, not on all transforms? I should note that this is for a JDK class, which as I understand are not verified upon loading normally? >>> Hi, >>> >>> We don't verify JDK classes because we provide and trust the implementation of these classes, but when you retransform these classes, we do not control what the redefinition will provide so verify them to maintain the security of the running application. This is a recent change in JDK 24, because the code intended to do this all along but there was a bug where it didn't. >>> >>> You can run -Xlog:verification to see the details of the VerifyError. If it is bytecodes in the JDK and not ones provided by you, please report this to us so we can fix it. >>> >>> Thank you, >>> Coleen >>> >>>> Thanks! >>>> Ryan -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Sat Mar 8 02:02:33 2025 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 7 Mar 2025 18:02:33 -0800 Subject: [External] : RFD: Grouping hot code in CodeCache In-Reply-To: References: <1B0C3138-761B-4DB0-8A98-977C6FC40178@amazon.co.uk> <2623f909-bb91-4450-bc05-d9181ba3abcb@oracle.com> Message-ID: <681195aa-50a9-45ad-abe5-6e6e2d164b01@oracle.com> On 3/7/25 3:47 PM, Astigeevich, Evgeny wrote: > Hi Vladimir, > > Thank you for the feedback. > >> My concern is that it will complicate VM existing code for not >> significant benefits in real production environment. To clarify. I don't like manual part of this - providing list of hot methods which should be collocated. I am fine to have special segment for special C2 compiled code. We will have one for some AOT code in Leyden. Move code in CodeCache to make it more dense is also fine. > > I think it won't complicate the existing code: > - Adding a code heap is ~50 lines of code, mostly in CodeCache::initialize_heaps. > - Relocating nmethods, according to PR[1], is ~300 lines of code. > - A grouping thread is simple and isolated. It will go through Java threads checking their last frame(s) and recording seen nmethods. It should have less code than the sweeper which was ~700 lines. > > I think it's better to wait for PoC to see the complexity. > >> What improvements your experiments in real production runs shows? And >> which JDK version you used for that? > > In production we are using internally 17 (static lists of methods) and 21 (dynamic lists of methods). > Improvements are in a range of 5% - 15%. They depend on how big CPU load is: the more CPU load the bigger improvement. Good. > >> As you know most of nmethod's metadata is moved from CodeCache. >> ... >> After that the code will be a lot more compact in CodeCache. Code sparsity >> should be less issue then. > > Yes, removing non-code from nmethod will improve code density. This means in a code region we will have more code vs non-code. > CPU instruction caches will like this. > > As I wrote in a comment to benchmark PR [2], Neoverse operates in code regions. For Neoverse it's more important to have as less code regions with active nmethods as possible. > > We are aware of cases when CodeCache usage is between 512M - 1G. The mentioned changes won't help in those cases. > If I remember, no public benchmarks have demonstrated improvements from non-code moved away from nmethod. > > Since the removal of the Sweeper, GC is in charge of cleaning CodeCache. We've seen cases when GC was triggered often because of allocation pressure on CodeCache. > For such cases, a recommended workaround is to increase the size of CodeCache from default 240M up to 512M. In such cases actively used nmethods will more likely be sparse. Hmm, may be we should restore counters decay for this case to prevent warm methods from compiling and polluting CodeCache and keep it small. > >> It would be nice if you redo your production experiments after that. > > Due to the complexity of customer's application we cannot run it on OpenJDKTip. It has thousand dependencies. We will need to move them on OpenJDKTip. > I think it would be difficult to backport the mentioned changes to 21 Understood. > >> I understand that we can still have sparsity due to "warm" nmethods and >> C1 compiled code mixed with "hot" C2 nmethods. > > Customers having issues with big CodeCache on Graviton usually turn off tiered compilation to reduce far jumps/calls. BTW, this is another argument for identifying active nmethods and grouping them together: it should reduce/eliminate far jumps/calls. > With small CodeCache mix of C1 and C2 nmethods is not an issue. > >> Can we simply use a separate CodeCache's segment for all >> C2 "hot" (we can specify frequency flag to determine what "hot" means) >> methods regardless when they are compiled. > > I did not get the idea. We already have the non-profiled segment where C2 code is put. Do you mean that at the compilation time some methods are put in the regular non-profile segment and some in the specific non-profile segment? Yes, I meant separate segments for hot and warm methods, both are c2 compiled code. It would still mix all 3 cases you pointed because compilation policy based mostly on what happened during startup. So it may be not good idea. > What we've seen that methods profiles keep changing. > There are the following cases: > 1. C2 methods used most of the time: their profile can stay the same or can get hotter. > 2. C2 methods used periodically: actively used, not used, actively used and so on > 3. C2 methods used actively during some time and never used after > > Currently GC identifies cases #3 and some cases #2, aka cold code. The percentage of methods case #1 is ~10% - 20%. > If we have 100M of C2 code, only 10M - 20M will be actively used. If we get unlucky, those 10M-20M could be spread across CodeCache and cause CPU stalls. > How can we identify those 10%-20% of methods at compilation time? I agree that it will be hard to determine that during compilation. We need some statistic after we compiled to find such methods. Sometime ago we had concept of Code Aging (removed after Sweeper was removed): https://github.com/vnkozlov/jdk17u-dev/commit/54db2c2d612c573f91f69b7b387b43a8e1c9d563 It added counter on nmethod entry to keep track if it is alive. We can use something similar to track how frequently nmethod is used. Erik Osterlund also had prototype in Leyden for call stack profiling by VM itself to find most used hot methods during training run. Thanks, Vladimir. > > BTW, I think the separate hot code heap might simplify flushing cold code. Everything not in the hot code heap can automatically assumed cold. > > Thanks, > Evgeny > > [1]: https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/23573__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBh_vDxrlg$ > [2]: https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/23831*issuecomment-2705085399__;Iw!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhJyHApiM$ > > ?On 06/03/2025, 22:41, "Vladimir Kozlov" > wrote: > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > > > > Hi Evgeny, > > > My concern is that it will complicate VM existing code for not > significant benefits in real production environment. > > > What improvements your experiments in real production runs shows? And > which JDK version you used for that? > > > As you know most of nmethod's metadata is moved from CodeCache. And > Boris Ulasevich will move the final part (relocation info) soon. After > that the code will be a lot more compact in CodeCache. Code sparsity > should be less issue then. > > > It would be nice if you redo your production experiments after that. > > > I understand that we can still have sparsity due to "warm" nmethods and > C1 compiled code mixed with "hot" C2 nmethods. I think compilation > policy has heuristic to detect "warm" method (time intervals between > invocations). Can we simply use a separate CodeCache's segment for all > C2 "hot" (we can specify frequency flag to determine what "hot" means) > methods regardless when they are compiled. Then you don't need to create > list or do anything special for them. Most likely we will waste more > space in CodeCache but it could be conditional under flag which you > already proposed in separate segment RFE. > > > Thanks, > Vladimir K > > > On 3/5/25 10:41 AM, Astigeevich, Evgeny wrote: >> Hi Vladimir, >> >> This is JDK-8326205: Implement grouping hot nmethods in CodeCache. >>> As I managed to synthesize a benchmark > (https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$ >> pull/23831 >> pull/23831__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- >> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- >> imnfmmpfw$>) to demonstrate performance impact of sparse code, I?d like >> to discuss a possible solution of the sparse code. >> >> High level, a solution is: >> >> * Detect hot code. >> * Group hot code. >> * Maintain grouped code. >> >> Downstream we tried two approaches: >> >> * *Static lists of methods (compile command):* Identify frequently >> used (hot) methods using test runs and provide static method lists >> to JVM in production. When JVM compiles a Java method and the method >> is on the list, JVM puts the code into to a designated code heap >> (HotCodeHeap). >> * *Dynamic lists of methods (compiler directives):* Profile an >> application in production and dynamically relocate identified hot >> methods to HotCodeHeap. Relocation was implemented with recompilation. >> >> The main advantage of static lists is zero profiling overhead in >> production. We do all profiling and analysis in test runs. Its problems are: >> >> * *Training Run Accuracy*: We need training runs to have execution >> paths closely mimicking production environments. Otherwise we put >> wrong methods into HotCodeHeap. >> * *Method List Maintenance:* We need to rerun training to regenerate >> lists when application code changes. Training runs are expensive and >> time-consuming. They require long runs to guarantee we see all major >> execution paths. Updating lists in production can be as complex as >> application deployment >> * *Method Placement Limitations:* Methods marked for HotCodeHeap are >> permanently placed into HotCodeHeap. No mechanism to remove methods >> that become less frequently used. >> >> We addressed these problems with dynamic lists of methods. We >> implemented a Java agent that runs within the same JVM to dynamically >> detect and manage hot Java methods without prior method identification. >> The agent detects hot methods using JFR. The agent manages hot Java >> methods in HotCodeHeap with compiler directives. A new compiler >> directive marks methods with dynamic states ("hot" or "cold"). Methods >> marked by the ?hot? state are recompiled and placed in HotCodeHeap. >> Methods marked by the ?cold? state are eventually removed from HotCodeHeap. >> >> Problems of this approach are: >> >> * It requires specific, complex modifications to compiler directive >> support: recompilation of Java methods affected by compiler >> directives changes. This functionality is unique to Java agent >> implementation and has limited potential for broader use. >> * The agent cannot guarantee Java methods are moved to/removed from >> the HotCodeHeap because updates of compiler directives can fail. >> * The agent knows nothing about compiled code, e.g. whether it?s C1 or >> C2 compiled, code size, profile. This data can useful for deciding >> to move or not to move to HotCodeHeap. >> * Recompilations, especially C2, are expensive. Having many of them >> can cause performance issues. Also recompiled code might differ from >> the code we have detected as ?hot?. >> >> Running these two approaches in production we learned: >> >> * We detect 95% of actively used methods withing the first 30 minutes >> of an application run. This is with JFR profiling configured: 90 >> seconds session duration, sampling each 11 ms, 8 minutes between >> profiling sessions. We can find actively used methods faster if we >> reduce a pause between profiling sessions and sampling period. >> However it will increase the profiling overhead and affect >> application performance. With the current configuration, the >> profiling overhead is between 1% - 2%. >> * A set of actively used methods gets into the steady state (no new >> methods added to, no methods removed from) within the first 60 minutes. >> * Static lists, when created from runs close to production, have 80% - >> 90% methods always in use. This does not change over time. >> * Predicting the size of HotCodeHeap is difficult, especially with >> dynamic lists. >> >> We want to have grouping of hot method functionality as a part Hotspot >> JVM. We will group only C2 compiled methods. We can group JVMCI compiled >> methods, e.g. Graal, if needed. We need profiling precise enough to >> detect major Java methods. Low overhead is more important than precision. >> >> We think we can have a solution which does not require a lot of code: >> >> * Detect hot code: we can an implementation based on the Sweeper: >> https://urldefense.com/v3/__https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhMWPimyo$ >> runtime/sweeper.hpp >> openjdk/jdk17u/blob/master/src/hotspot/share/runtime/ >> sweeper.hpp__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- >> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- >> imVr_axpo$>. We will use the handshakes mechanism, what the Sweeper >> used, to detect nmethods on the top of thread stacks. >> * Group hot code: we have a draft PR https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$ >> pull/23573 >> jdk/pull/23573__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- >> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- >> imcL9xtiE$>. It implements relocation of nmethods within CodeCache. >> * Maintain grouped code: we will add an additional code heap where hot >> nmethods will be relocated to. >> >> What do you think about this approach? Are there other possible solutions? >> >> Thanks, >> >> Evgeny A. >> >> >> >> >> Amazon Development Centre (London) Ltd.Registered in England and Wales >> with registration number 04543232 with its registered office at 1 >> Principal Place, Worship Street, London EC2A 2FA, United Kingdom. >> >> > > > > > > > > > Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. > > From alan.bateman at oracle.com Sat Mar 8 08:55:30 2025 From: alan.bateman at oracle.com (Alan Bateman) Date: Sat, 8 Mar 2025 08:55:30 +0000 Subject: [External] : Re: Verification in agent transformers In-Reply-To: <37F06AA7-E883-4BCF-8E0E-6B2CF1A81FBD@iernst.net> References: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> <4d1ccef8-c837-482b-abb3-72f28593d08a@oracle.com> <37F06AA7-E883-4BCF-8E0E-6B2CF1A81FBD@iernst.net> Message-ID: On 07/03/2025 23:50, Ryan Ernst wrote: > > If the class is loaded and then linked, and is not on the bootstrap > class loader, then it should be verified. > > I agree, but that?s not what we observed. We only see the VerifyError > if retransformClasses is called. If instead the class is loaded after > the transformer has been registered, no VerifyError occurs, even > though the transformer produced technically broken bytecode (I say > ?technically? because although the parameter type was wrong, we never > used the parameter in this method, so it didn?t actually cause any > problems at runtime, it was just a bad reference on the stack that was > ignored). I think the question you are asking is whether there is verification of classes modified at class load time with a ClassFileTransformer (ClassFileLoadHook in JVMTI speak) when those classes are in modules mapped to the boot class loader. -Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: From cslucas at openjdk.org Sat Mar 8 14:04:03 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Sat, 8 Mar 2025 14:04:03 GMT Subject: Integrated: 8343468: GenShen: Enable relocation of remembered set card tables In-Reply-To: References: Message-ID: On Fri, 17 Jan 2025 05:18:39 GMT, Cesar Soares Lucas wrote: > In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. > > The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. > > The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. > > Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. > > The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. > > Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. > > Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. This pull request has now been integrated. Changeset: 4e1367e3 Author: Cesar Soares Lucas URL: https://git.openjdk.org/jdk/commit/4e1367e34be724a0f84069100854c38333610714 Stats: 271 lines in 25 files changed: 132 ins; 87 del; 52 mod 8343468: GenShen: Enable relocation of remembered set card tables Reviewed-by: shade, kdnilsen, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/23170 From alanb at openjdk.org Sat Mar 8 18:15:06 2025 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 8 Mar 2025 18:15:06 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v9] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 19:45:21 GMT, Cesar Soares Lucas wrote: >> In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. >> >> The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. >> >> The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. >> >> Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. >> >> The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. >> >> Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. >> >> Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix build: no shenandoah on arm32. This seems to be break Oracle --disable-jvm-feature-shenandoahgc builds on aarch64. [2025-03-08T14:16:00,338Z] src/hotspot/cpu/aarch64/aarch64.ad:4544:58: error: no member named 'ShenandoahBarrierSet' in 'BarrierSet' [2025-03-08T14:16:00,338Z] !BarrierSet::barrier_set()->is_a(BarrierSet::ShenandoahBarrierSet) && ------------- PR Comment: https://git.openjdk.org/jdk/pull/23170#issuecomment-2708425850 From mdoerr at openjdk.org Sat Mar 8 18:18:00 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 8 Mar 2025 18:18:00 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28] In-Reply-To: <7rVbCbWDqrib9Jyj7_hkD-r9rkaAOIXuwOGAqImrxoY=.a55e9572-b4e6-4cc2-aa0e-c23deb9961ce@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <7rVbCbWDqrib9Jyj7_hkD-r9rkaAOIXuwOGAqImrxoY=.a55e9572-b4e6-4cc2-aa0e-c23deb9961ce@github.com> Message-ID: <1wMuCBIwYPaPM-bbsnFHi8hnkq-IL5Q_kCmaa1AdDpM=.1240fd83-db6d-489a-bbb3-48891daac064@github.com> On Fri, 7 Mar 2025 16:59:30 GMT, Suchismith Roy wrote: >> Your version extracts 2 8 Byte parts and feeds them into separate xor instructions. My proposal performs both 8 Byte xor operations with one vxor instruction by selecting the input bits accordingly. It furthermore avoids swapping halves forth and back (I swap the halves of vReducedLow instead). >> Have you tried? > > @TheRealMDoerr Yes. The tests do not pass with this. > Trying to find a scope to reduce instructions. > masm->vsldoi(vLowProduct, vLowProduct, vLowProduct, 8); // Swap > masm->vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant > masm->vsldoi(vCombinedResult, vLowProduct, vLowProduct, 8); // Swap > > > can be brought down to 2 instructions. > Still looking for scope to reduce. Let me know your inputs I still find it hard to read. Can you describe the algorithm in pseudo code or mathematical equations? We can try to map it to a shorter instruction sequence. Btw. the comment looks wrong here: vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1986127977 From tschatzl at openjdk.org Sat Mar 8 19:32:54 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Sat, 8 Mar 2025 19:32:54 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v15] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. Cause are last-minute changes before making the PR ready to review. Testing: without the patch, occurs fairly frequently when continuously (1 in 20) starting refinement. Does not afterward. - * ayang review 3 * comments * minor refactorings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/350a4fa3..93b884f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=13-14 Stats: 35 lines in 5 files changed: 30 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Sat Mar 8 19:32:54 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Sat, 8 Mar 2025 19:32:54 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v9] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 10:46:13 GMT, Thomas Schatzl wrote: > I got an error while testing java/foreign/TestUpcallStress.java on linuxaarch64 with this PR: Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2708458459 From lmesnik at openjdk.org Mon Mar 10 03:03:00 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 10 Mar 2025 03:03:00 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v5] In-Reply-To: <3bphXKLpIpxAZP-FEOeob6AaHbv0BAoEceJka64vMW8=.3e4f74e0-9479-4926-b365-b08d8d702692@github.com> References: <3bphXKLpIpxAZP-FEOeob6AaHbv0BAoEceJka64vMW8=.3e4f74e0-9479-4926-b365-b08d8d702692@github.com> Message-ID: On Thu, 6 Mar 2025 17:37:33 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Accepted review comments. There are no any new tests in the PR. How fix has been tested by openjdk tests? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23860#issuecomment-2709309387 From fyang at openjdk.org Mon Mar 10 03:40:01 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 10 Mar 2025 03:40:01 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v5] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Fri, 7 Mar 2025 11:42:34 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. >> >> ## Performance >> >> data >> >> Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 >> Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 >> Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 >> Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 >> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 >> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 >> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 >> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.81... > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - clean > - renaming Thanks for the update. There minor comments remain. Seems fine to me otherwise. src/hotspot/cpu/riscv/riscv.ad line 4975: > 4973: > 4974: ins_pipe(fp_load_constant_s); > 4975: %} Can we put this two after `loadConD0`? Then we have a more consistent order for the three variants (F->D->FH) at each place. src/hotspot/cpu/riscv/riscv.ad line 8324: > 8322: %} > 8323: > 8324: instruct min_max_HF_reg(fRegF dst, fRegF src1, fRegF src2) It will be easier for people to map to exitsting F/D variants if we use similar names like `sqrtHF_reg`, `minHF_reg_reg`, `maxHF_reg_reg`, `maddHF_reg_reg`, and `binOpsHF_reg_reg` for the the HF variant. src/hotspot/cpu/riscv/vm_version_riscv.hpp line 301: > 299: static bool supports_fencei_barrier() { return ext_Zifencei.enabled(); } > 300: > 301: static bool supports_float16_float_conversion() { Is it safe to simply rename this as `supports_float16` like other CPUs? I suppose it will still work in functionality even if we only have the `Zfhmin` extension, right? ------------- PR Review: https://git.openjdk.org/jdk/pull/23844#pullrequestreview-2669705559 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1986554531 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1986550295 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1986557022 From fyang at openjdk.org Mon Mar 10 03:40:02 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 10 Mar 2025 03:40:02 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v4] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Fri, 7 Mar 2025 11:42:53 GMT, Hamlin Li wrote: > > Maybe we should also update the `@requires` of the test at the same time? Currently, it says `| (os.arch == "riscv64" & vm.cpu.features ~= ".*zvfh.*")`. Maybe we change `zvfh` into `zfh`? > > No, as this test is for "vector conversion chain", only support of `zfh` should not trigger the test. BTW, scalar tests are in other test files. In the future, when we support vectorization the IR verification test should be enabled again, but it still depends on `zvfh` rather than `zfh`. Hope this answer your question? That makes sense to me. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23844#issuecomment-2709347076 From stefank at openjdk.org Mon Mar 10 07:28:51 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 10 Mar 2025 07:28:51 GMT Subject: RFR: 8350905: Releasing a WeakHandle's referent may extend its lifetime In-Reply-To: References: Message-ID: <2rdNxa8sWH0qCHRDCtZgM27hh839433UT_KXGhjK7s4=.856e615e-e9a9-469a-b3d5-fb8b5d6181a2@github.com> On Thu, 6 Mar 2025 18:57:18 GMT, William Kemper wrote: > When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. We are proposing that native weak references are cleared with an additional `AS_NO_KEEPALIVE` decorator. This is similar to what was done for j.l.r.WeakReference in [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696). > > # Testing > > GHA, `hotspot_gc_shenandoah`. Additionally, for G1, ZGC, and Shenandoah we've run Extremem, Dacapo, SpecJVM2008, SpecJBB2015, Heapothesys and Diluvian. All executions completed without errors. @fisk gave an offline comment that he would prefer if this could be handled by the GC Barrier backend instead of having to change the runtime code to understand how SATB and weak handles work. Take a look at how ZGC deals with this: template inline void ZBarrierSet::AccessBarrier::oop_store_not_in_heap(zpointer* p, oop value) { verify_decorators_absent(); if (!is_store_barrier_no_keep_alive()) { store_barrier_native_without_healing(p); } Raw::store(p, store_good(value)); } and then how `is_store_barrier_no_keep_alive` ensures that ON_PHANTOM_OOP_REF stores are treated as no-keepalive stores. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23935#issuecomment-2709653170 From jsjolen at openjdk.org Mon Mar 10 09:00:00 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 10 Mar 2025 09:00:00 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag [v5] In-Reply-To: <0SlK7ixxGv5N7-LQnC7SwgpcK4Oz_9_H24qnrGPrTpc=.9bfd6434-6a48-4563-9dd6-66cff70dafe7@github.com> References: <0SlK7ixxGv5N7-LQnC7SwgpcK4Oz_9_H24qnrGPrTpc=.9bfd6434-6a48-4563-9dd6-66cff70dafe7@github.com> Message-ID: On Fri, 7 Mar 2025 16:06:32 GMT, Afshin Zafari wrote: >> With the `size` parameter there will be no need to traverse/go through the nodes between the base and end of the region. >> Tests: >> linux-x64-debug, gtest:NMT* and runtime/NMT* > > Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge remote-tracking branch 'origin/master' into _8350566_size_par_set_tag > - new fix. > - fixed build problem. > - ReservedSpace is accepted as param. > - applied also to VMT. > - 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag Please check if this PR is responsible for the test failures before integrating. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23770#pullrequestreview-2670204517 From roland at openjdk.org Mon Mar 10 09:02:15 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 10 Mar 2025 09:02:15 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v14] In-Reply-To: <9c34YjVjK0BMclNqFWMSitBV2YTcu_jmgWVitjRgvF0=.0f225af6-5888-4160-9a54-09baa696da1c@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <9c34YjVjK0BMclNqFWMSitBV2YTcu_jmgWVitjRgvF0=.0f225af6-5888-4160-9a54-09baa696da1c@github.com> Message-ID: On Fri, 7 Mar 2025 06:19:03 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 47 additional commits since the last revision: > > - Merge branch 'master' into topic.intrinsify-max-min-long > - Add assertion comments > - Add simple reduction benchmarks on top of multiply ones > - Merge branch 'master' into topic.intrinsify-max-min-long > - Fix typo > - Renaming methods and variables and add docu on algorithms > - Fix copyright years > - Make sure it runs with cpus with either avx512 or asimd > - Test can only run with 256 bit registers or bigger > > * Remove platform dependant check > and use platform independent configuration instead. > - Fix license header > - ... and 37 more: https://git.openjdk.org/jdk/compare/07ef652d...1aa690d3 Marked as reviewed by roland (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20098#pullrequestreview-2670211951 From chagedorn at openjdk.org Mon Mar 10 09:19:10 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 10 Mar 2025 09:19:10 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v14] In-Reply-To: <9c34YjVjK0BMclNqFWMSitBV2YTcu_jmgWVitjRgvF0=.0f225af6-5888-4160-9a54-09baa696da1c@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <9c34YjVjK0BMclNqFWMSitBV2YTcu_jmgWVitjRgvF0=.0f225af6-5888-4160-9a54-09baa696da1c@github.com> Message-ID: On Fri, 7 Mar 2025 06:19:03 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 47 additional commits since the last revision: > > - Merge branch 'master' into topic.intrinsify-max-min-long > - Add assertion comments > - Add simple reduction benchmarks on top of multiply ones > - Merge branch 'master' into topic.intrinsify-max-min-long > - Fix typo > - Renaming methods and variables and add docu on algorithms > - Fix copyright years > - Make sure it runs with cpus with either avx512 or asimd > - Test can only run with 256 bit registers or bigger > > * Remove platform dependant check > and use platform independent configuration instead. > - Fix license header > - ... and 37 more: https://git.openjdk.org/jdk/compare/fd78e706...1aa690d3 Good work and collection of all the data! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20098#pullrequestreview-2670256931 From shade at openjdk.org Mon Mar 10 09:41:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Mar 2025 09:41:01 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: <5nkWE-TpdoNk-k_5JE7MopX5_KJf6DjjLWMADxWr29k=.ee34fa19-882c-4731-86f6-bdaed2a6e276@github.com> Message-ID: <5PZgChiJciTkkZIUnXtTWZMB4ZxN8DmZHUWBFt9ptBw=.77216c80-a470-4e20-908e-7e419404e607@github.com> On Fri, 7 Mar 2025 07:18:27 GMT, David Holmes wrote: > You could add a couple of lines to the build code and it would not be possible to build 32-bit, so that is a necessary but not sufficient condition to claim to implement the JEP IMO. Agreed. This is why this PR removes the actual implementation of the port as well. Even if you can coerce build system to pass the arch checks, x86_32 would not build, because there is no x86_32 port in the sources anymore. There are only assorted, heavily-intertwined-with-x86-64 leftovers around Hotspot subsystems that were needed to support the port. We will deal with those leftovers at leisurely pace after the port is gone. > @dholmes-ora: I'm not looking for one big PR, I'm looking for multiple PR's as proposed but which all fall under the JEP umbrella. Until the JEP is targeted then nothing can be integrated anyway. This is what, I thought, dependent PR's were designed for. > @magicus Instead, that honor should fall on an umbrella JBS issue, which is dependent on this PR, but also the other planned updates. Before these are done, we can't really say that the JEP is implemented. I believe we are in agreement that we do not want to cobble all removals/cleanups into a singular PR/changeset. We _can_ convert the umbrella RFE for post-JEP cleanups as the implementation task subtasks. I.e. do: - JDK-XXXXX: Implement JEP 503: Remove the 32-bit x86-port (<---- this would be an umbrella, without a changeset) - JDK-XXXXX: JEP 503: Remove the x86_32 files and builds support (<---- this would be this PR) - JDK-XXXXX: JEP 503: Remove code blocks that handle UseSSE < 2 - JDK-XXXXX: JEP 503: Remove dead IA32 code blocks ... Then we manually close umbrella issue as "implemented" when subtasks are done. What I dislike about this approach is that we are committing to doing free-standing post- x86-32 cleanups under the JEP umbrella. This runs into several problems: a) some cleanups are very deep, intertwined with x86-64, connected to x86-32-zero, and might even be rejected, like deep cleaning in `MacroAssembler` ([JDK-8351162](https://bugs.openjdk.org/browse/JDK-8351162)); b) some cleanups would only be discovered later, and would require yet another umbrella tasks for post-JEP work anyway. Are you agreeing to this, @dholmes-ora, @magicus? This would create more work for ourselves and our fellow engineers in JDK 25 timeframe. If you are insisting we need to do it this way, can I count on your prompt reviews in these new JEP subtasks? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2709964337 From shade at openjdk.org Mon Mar 10 09:49:44 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Mar 2025 09:49:44 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v2] In-Reply-To: References: Message-ID: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Drop commented out block from deprecations - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone - Generic 32-bit x86 configure error supercedes Windows 32-bit x86 - 8345169: Implement JEP 503: Remove the 32-bit x86 Port ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23906/files - new: https://git.openjdk.org/jdk/pull/23906/files/b76816cb..0fef97b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23906&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23906&range=00-01 Stats: 7320 lines in 306 files changed: 3971 ins; 1797 del; 1552 mod Patch: https://git.openjdk.org/jdk/pull/23906.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23906/head:pull/23906 PR: https://git.openjdk.org/jdk/pull/23906 From shade at openjdk.org Mon Mar 10 09:49:44 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Mar 2025 09:49:44 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v2] In-Reply-To: References: Message-ID: <3jbKFXHYH2mgyYOQjn2rfGm0IpIwH377DuDrZAY4X7w=.8d0767f3-7190-4396-824a-d55e6a61f479@github.com> On Fri, 7 Mar 2025 15:06:18 GMT, Magnus Ihse Bursie wrote: >> I think leaving a comment describing how to deprecate a port is useful. To look it up in history you have to realise there is something to look up. >> >> "They who are not reminded of the past will invent a new way to do it in the future." > > The `--enable-deprecated-ports` is still there. All that is removed is an if statement and a print line. I know the make syntax can seem intimidating, but just ask me or any other build team member if you need help to recreate such a thing. It is not like it is a complicated algorithm that can be written in many ways. This is just make's equivalant of: > > > if (some_condition) { > println("whatever"); > } > > > To me this is just utter nonsense to keep that commented out. "Utter nonsense" might be a bit harsh. We do code samples around OpenJDK all the time to leave breadcrumbs for future use. As I said, I don't mind removing it, done so in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23906#discussion_r1986943830 From alanb at openjdk.org Mon Mar 10 09:52:54 2025 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 10 Mar 2025 09:52:54 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v2] In-Reply-To: References: Message-ID: <8GSZRPDK4WLn6bHC2D2Ow47a-xd9NzCN6azXs2aDp_g=.47762983-f579-4ea1-b22e-abbd1740e6d3@github.com> On Mon, 10 Mar 2025 09:49:44 GMT, Aleksey Shipilev wrote: >> This PR implements JEP 503: Remove the 32-bit x86 Port. >> >> The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. >> >> This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. >> >> The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. >> >> Additional testing: >> - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) >> - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Drop commented out block from deprecations > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Generic 32-bit x86 configure error supercedes Windows 32-bit x86 > - 8345169: Implement JEP 503: Remove the 32-bit x86 Port JEP 486 (Permanently Disable the Security Manager) updated the API and removed the ability to set a SecurityManager in a first big commit. The JBS issue for that commit was associated with the JEP. There were 150+ follow on issues, some removed essentially dead code, others fixed or removed tests that were excluded by the first commit. It wasn't initially clear if all cleanups and code removal could be done in the same release (JDK 24) but almost all did happen as only a few remaining cleanups to APIs docs spilled over into JDK 25. Anyway, just pointing out this JEP as an example that may be useful to look at when considering the approach for the 32-bit x86 port removal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2709998166 From duke at openjdk.org Mon Mar 10 10:03:55 2025 From: duke at openjdk.org (duke) Date: Mon, 10 Mar 2025 10:03:55 GMT Subject: RFR: 8350266: [PPC64] Interpreter: intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 13:35:51 GMT, David Linus Briemann wrote: > Implementation of intrinsic Thread.currentThread() for PPC64. @dbriemann Your change (at version 49dbc98b80f936974e521b6d34ce74d2418d3dd3) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23677#issuecomment-2710027773 From shade at openjdk.org Mon Mar 10 10:08:08 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Mar 2025 10:08:08 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v3] In-Reply-To: References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: On Fri, 7 Mar 2025 13:28:43 GMT, Erik Gahlin wrote: > I'm hesitant because the peak value can easily be calculated, which we already do for other events (CPULoad, NetworkUtilization, NativeMemoryUsage etc) in "jfr view". True, I thought about that, and still ended up adding `max`, because it is tracked internally by locking subsystem, and thus does not run into sampling bias. I.e. JFR sampling thread may not see the spike in monitor counts if bulk inflation/deflation happens between two samples. Looking around other stats in the `metadata.xml`, maybe a better name for it is `peakCount`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23900#issuecomment-2710038641 From duke at openjdk.org Mon Mar 10 10:24:44 2025 From: duke at openjdk.org (David Linus Briemann) Date: Mon, 10 Mar 2025 10:24:44 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms [v2] In-Reply-To: References: Message-ID: > 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms David Linus Briemann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - remove CountBytecodesTest from tier1 - Merge branch 'master' into dlb/bytecode_counter_overflow - remove auto included header - fix x86 asm - address review comment, add back comma to copyright header - formatting - remove bad header - add missing comma to copyright header - speed up runtime by running less bytecodes, add explanation - add copyright header and @bug number - ... and 5 more: https://git.openjdk.org/jdk/compare/dd67a552...31a52156 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23766/files - new: https://git.openjdk.org/jdk/pull/23766/files/45699ec5..31a52156 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23766&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23766&range=00-01 Stats: 43114 lines in 1277 files changed: 19836 ins; 16962 del; 6316 mod Patch: https://git.openjdk.org/jdk/pull/23766.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23766/head:pull/23766 PR: https://git.openjdk.org/jdk/pull/23766 From duke at openjdk.org Mon Mar 10 10:26:02 2025 From: duke at openjdk.org (David Linus Briemann) Date: Mon, 10 Mar 2025 10:26:02 GMT Subject: Integrated: 8350266: [PPC64] Interpreter: intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 13:35:51 GMT, David Linus Briemann wrote: > Implementation of intrinsic Thread.currentThread() for PPC64. This pull request has now been integrated. Changeset: 783eda9f Author: David Linus Briemann Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/783eda9f54a6e17771c637ff5cac5e30d1facde9 Stats: 14 lines in 1 file changed: 13 ins; 1 del; 0 mod 8350266: [PPC64] Interpreter: intrinsify Thread.currentThread() Reviewed-by: mdoerr, rrich ------------- PR: https://git.openjdk.org/jdk/pull/23677 From ihse at openjdk.org Mon Mar 10 10:46:55 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 10 Mar 2025 10:46:55 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 09:49:44 GMT, Aleksey Shipilev wrote: >> This PR implements JEP 503: Remove the 32-bit x86 Port. >> >> The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. >> >> This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. >> >> The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. >> >> Additional testing: >> - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) >> - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Drop commented out block from deprecations > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Generic 32-bit x86 configure error supercedes Windows 32-bit x86 > - 8345169: Implement JEP 503: Remove the 32-bit x86 Port I don't have a super strong opinion on this. If you want to call this the implementation of JEP 503, I'm fine with that. I guess it all depends a bit on where you want to draw the line between "removal" and "subsequent cleanups that have now been possible". The latter part almost never ends in a codebase as large as the JDK; I still find Solaris remnants in the code to this day, so getting rid of *all* code that is no longer necessary cannot reasonably be a criterion for finishing a removal. I guess I just viewed the intertwined ifdef:ed code as more part of the actual removal, but then again, it's Hotspot code and that's strictly really not my business. :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2710148614 From rehn at openjdk.org Mon Mar 10 11:12:52 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 10 Mar 2025 11:12:52 GMT Subject: RFR: 8351145: RISC-V: only enable some crypto intrinsic when AvoidUnalignedAccess == false In-Reply-To: <8-3lLYr9jtNOQJhRLRyAA2xxfxG2aVm27HIGcbNsCfY=.8e43a66a-5602-4345-b5a0-cfdaab7e0d8f@github.com> References: <8-3lLYr9jtNOQJhRLRyAA2xxfxG2aVm27HIGcbNsCfY=.8e43a66a-5602-4345-b5a0-cfdaab7e0d8f@github.com> Message-ID: On Tue, 4 Mar 2025 16:11:41 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > > Depending whether a cpu supports fast misaligned access or not, the misaligned access can impact the performance a lot. > Some crypto intrinsic implementation on riscv do not consider data alignment and just use `ld` to load input byte array, and seems there is no way to do it, the main reason is that at java API level, the input byte array to these JVM intrinsic could be part of a real java array, so the input byte array could be 1/2...7 byte aligned. > And with the introduction of COH, it would be even complicated to do the input data alignment. > > So, for the consistency of performance, seems it's better to disable these intrinsics when AvoidUnalignedAccess == true. > And the user can still enable the intrinsics explicitly on a CPU with AvoidUnalignedAccess == true if they want so. > > Thanks! Seems fine, thank you! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23903#pullrequestreview-2670586910 From mli at openjdk.org Mon Mar 10 11:42:51 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 10 Mar 2025 11:42:51 GMT Subject: RFR: 8351145: RISC-V: only enable some crypto intrinsic when AvoidUnalignedAccess == false In-Reply-To: References: <8-3lLYr9jtNOQJhRLRyAA2xxfxG2aVm27HIGcbNsCfY=.8e43a66a-5602-4345-b5a0-cfdaab7e0d8f@github.com> Message-ID: On Mon, 10 Mar 2025 11:10:41 GMT, Robbin Ehn wrote: > Seems fine, thank you! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23903#issuecomment-2710296113 From coleenp at openjdk.org Mon Mar 10 12:30:59 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 10 Mar 2025 12:30:59 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 09:49:44 GMT, Aleksey Shipilev wrote: >> This PR implements JEP 503: Remove the 32-bit x86 Port. >> >> The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. >> >> This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. >> >> The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. >> >> Additional testing: >> - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) >> - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Drop commented out block from deprecations > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Generic 32-bit x86 configure error supercedes Windows 32-bit x86 > - 8345169: Implement JEP 503: Remove the 32-bit x86 Port Marked as reviewed by coleenp (Reviewer). I do have a strong opinion on this. The security manager removal is a good model to follow. Since this change removes the capability and 50K LOC, I think it's sufficient to say it implements the JEP. The other removals are cleanups and don't need to have to be tied up in the process, and can happen when they're ready and reviewed. There's no technical or practical reason to make this more difficult. ------------- PR Review: https://git.openjdk.org/jdk/pull/23906#pullrequestreview-2670782075 PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2710418458 From ihse at openjdk.org Mon Mar 10 12:57:57 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 10 Mar 2025 12:57:57 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 09:49:44 GMT, Aleksey Shipilev wrote: >> This PR implements JEP 503: Remove the 32-bit x86 Port. >> >> The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. >> >> This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. >> >> The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. >> >> Additional testing: >> - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) >> - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Drop commented out block from deprecations > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Generic 32-bit x86 configure error supercedes Windows 32-bit x86 > - 8345169: Implement JEP 503: Remove the 32-bit x86 Port Marked as reviewed by ihse (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23906#pullrequestreview-2670852740 From ihse at openjdk.org Mon Mar 10 13:52:56 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 10 Mar 2025 13:52:56 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v2] In-Reply-To: <3jbKFXHYH2mgyYOQjn2rfGm0IpIwH377DuDrZAY4X7w=.8d0767f3-7190-4396-824a-d55e6a61f479@github.com> References: <3jbKFXHYH2mgyYOQjn2rfGm0IpIwH377DuDrZAY4X7w=.8d0767f3-7190-4396-824a-d55e6a61f479@github.com> Message-ID: On Mon, 10 Mar 2025 09:46:38 GMT, Aleksey Shipilev wrote: >> The `--enable-deprecated-ports` is still there. All that is removed is an if statement and a print line. I know the make syntax can seem intimidating, but just ask me or any other build team member if you need help to recreate such a thing. It is not like it is a complicated algorithm that can be written in many ways. This is just make's equivalant of: >> >> >> if (some_condition) { >> println("whatever"); >> } >> >> >> To me this is just utter nonsense to keep that commented out. > > "Utter nonsense" might be a bit harsh. We do code samples around OpenJDK all the time to leave breadcrumbs for future use. As I said, I don't mind removing it, done so in new commit. Yes, you are right. That did not sound good. I apologize. (And thanks for removing it!) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23906#discussion_r1987327814 From mli at openjdk.org Mon Mar 10 14:32:14 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 10 Mar 2025 14:32:14 GMT Subject: RFR: 8318220: RISC-V: C2 ReverseI Message-ID: Hi, Can you help to review this patch to add ReverseI and ReverseIL intrinsic on riscv? Thanks! ------------- Commit messages: - clean test - clean test - clean test - clean - add tests - clean - merge master - rename rev_b to brev8 - fix UseZbkb flag - merge master - ... and 1 more: https://git.openjdk.org/jdk/compare/18931d05...9bce9054 Changes: https://git.openjdk.org/jdk/pull/23963/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23963&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318220 Stats: 292 lines in 10 files changed: 292 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23963.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23963/head:pull/23963 PR: https://git.openjdk.org/jdk/pull/23963 From sroy at openjdk.org Mon Mar 10 15:24:04 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Mon, 10 Mar 2025 15:24:04 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28] In-Reply-To: <1wMuCBIwYPaPM-bbsnFHi8hnkq-IL5Q_kCmaa1AdDpM=.1240fd83-db6d-489a-bbb3-48891daac064@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <7rVbCbWDqrib9Jyj7_hkD-r9rkaAOIXuwOGAqImrxoY=.a55e9572-b4e6-4cc2-aa0e-c23deb9961ce@github.com> <1wMuCBIwYPaPM-bbsnFHi8hnkq-IL5Q_kCmaa1AdDpM=.1240fd83-db6d-489a-bbb3-48891daac064@github.com> Message-ID: On Sat, 8 Mar 2025 18:14:48 GMT, Martin Doerr wrote: >> @TheRealMDoerr Yes. The tests do not pass with this. >> Trying to find a scope to reduce instructions. >> masm->vsldoi(vLowProduct, vLowProduct, vLowProduct, 8); // Swap >> masm->vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant >> masm->vsldoi(vCombinedResult, vLowProduct, vLowProduct, 8); // Swap >> >> >> can be brought down to 2 instructions. >> Still looking for scope to reduce. Let me know your inputs > > I still find it hard to read. Can you describe the algorithm in pseudo code or mathematical equations? We can try to map it to a shorter instruction sequence. > Btw. the comment looks wrong here: vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant Yes I need to modify the comments. masm->vsldoi(vTmp8, vMidProduct, vZero, 8); // mL : Extract the lower 64 bits of M masm->vsldoi(vTmp9, vZero, vMidProduct, 8); // mH : Extract the higher 64 bits of M masm->vxor(vLowProduct, vLowProduct, vTmp8); // LL + mL : Partial result for lower half masm->vxor(vHighProduct, vHighProduct, vTmp9); // HH + mH : Partial result for upper half The above 4 are solving the parts where we multiply the lower and higher halves with the middle product. http://web.archive.org/web/20130609111954/http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/communications-ia-galois-counter-mode-paper.pdf Page 11 explains it. I am figuring out how to write the maths equation for the reduction part using vConstC2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1987519084 From wkemper at openjdk.org Mon Mar 10 17:25:57 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Mar 2025 17:25:57 GMT Subject: RFR: 8350905: Releasing a WeakHandle's referent may extend its lifetime In-Reply-To: <2rdNxa8sWH0qCHRDCtZgM27hh839433UT_KXGhjK7s4=.856e615e-e9a9-469a-b3d5-fb8b5d6181a2@github.com> References: <2rdNxa8sWH0qCHRDCtZgM27hh839433UT_KXGhjK7s4=.856e615e-e9a9-469a-b3d5-fb8b5d6181a2@github.com> Message-ID: On Mon, 10 Mar 2025 07:26:02 GMT, Stefan Karlsson wrote: >> When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. We are proposing that native weak references are cleared with an additional `AS_NO_KEEPALIVE` decorator. This is similar to what was done for j.l.r.WeakReference in [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696). >> >> # Testing >> >> GHA, `hotspot_gc_shenandoah`. Additionally, for G1, ZGC, and Shenandoah we've run Extremem, Dacapo, SpecJVM2008, SpecJBB2015, Heapothesys and Diluvian. All executions completed without errors. > > @fisk gave an offline comment that he would prefer if this could be handled by the GC Barrier backend instead of having to change the runtime code to understand how SATB and weak handles work. > > Take a look at how ZGC deals with this: > > template > inline void ZBarrierSet::AccessBarrier::oop_store_not_in_heap(zpointer* p, oop value) { > verify_decorators_absent(); > > if (!is_store_barrier_no_keep_alive()) { > store_barrier_native_without_healing(p); > } > > Raw::store(p, store_good(value)); > } > > > and then how `is_store_barrier_no_keep_alive` ensures that ON_PHANTOM_OOP_REF stores are treated as no-keepalive stores. Thank you @stefank , will take a look at this today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23935#issuecomment-2711314050 From mli at openjdk.org Mon Mar 10 17:26:02 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 10 Mar 2025 17:26:02 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v5] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Mon, 10 Mar 2025 03:29:56 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - clean >> - renaming > > src/hotspot/cpu/riscv/riscv.ad line 4975: > >> 4973: >> 4974: ins_pipe(fp_load_constant_s); >> 4975: %} > > Can we put this two after `loadConD0`? Then we have a more consistent order for the three variants (F->D->FH) at each place. In fact, in my mind the order should be FH->F->D, considering the width changing from 16->32->64. > src/hotspot/cpu/riscv/riscv.ad line 8324: > >> 8322: %} >> 8323: >> 8324: instruct min_max_HF_reg(fRegF dst, fRegF src1, fRegF src2) > > It will be easier for people to map to exitsting F/D variants if we use similar names like `sqrtHF_reg`, `minHF_reg_reg`, `maxHF_reg_reg`, `maddHF_reg_reg`, and `binOpsHF_reg_reg` for the the HF variant. I think there is no convention of instruct naming, as long as there is no duplicate names, and normally we don't care about the instruct name too much. What really matters is the match rule, e.g. MinHF/ManHF and so on, which indicates what the instructs do. > src/hotspot/cpu/riscv/vm_version_riscv.hpp line 301: > >> 299: static bool supports_fencei_barrier() { return ext_Zifencei.enabled(); } >> 300: >> 301: static bool supports_float16_float_conversion() { > > Is it safe to simply rename this as `supports_float16` like other CPUs? I suppose it will still work in functionality even if we only have the `Zfhmin` extension, right? In fact, I think `supports_float16` is confusing, in particular on riscv, as e.g. `AddHF` requires `UseZfh`, but `ConvF2HF` requires `UseZfh || UseZfhmin`, so I think it's better to keep supports_float16_float_conversion, as its naming is much more explicit and clearer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1987723823 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1987724274 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1987725909 From ryan at iernst.net Mon Mar 10 18:25:16 2025 From: ryan at iernst.net (Ryan Ernst) Date: Mon, 10 Mar 2025 11:25:16 -0700 Subject: [External] : Re: Verification in agent transformers In-Reply-To: References: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> <4d1ccef8-c837-482b-abb3-72f28593d08a@oracle.com> <37F06AA7-E883-4BCF-8E0E-6B2CF1A81FBD@iernst.net> Message-ID: <12AA447F-DFEC-4863-8E16-8D0265EC3CAE@iernst.net> I created a reproduction: https://github.com/rjernst/verify-error-repro? rjernst/verify-error-repro github.com Again, the VerifyError is correct, it?s what we expect (we created bad bytecode in a transform), but it doesn?t always occur. > On Mar 8, 2025, at 12:55?AM, Alan Bateman wrote: > > > > On 07/03/2025 23:50, Ryan Ernst wrote: >> > If the class is loaded and then linked, and is not on the bootstrap class loader, then it should be verified. >> >> I agree, but that?s not what we observed. We only see the VerifyError if retransformClasses is called. If instead the class is loaded after the transformer has been registered, no VerifyError occurs, even though the transformer produced technically broken bytecode (I say ?technically? because although the parameter type was wrong, we never used the parameter in this method, so it didn?t actually cause any problems at runtime, it was just a bad reference on the stack that was ignored). > > I think the question you are asking is whether there is verification of classes modified at class load time with a ClassFileTransformer (ClassFileLoadHook in JVMTI speak) when those classes are in modules mapped to the boot class loader. > > -Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: verify-error-repro.png Type: image/png Size: 120561 bytes Desc: not available URL: From alan.bateman at oracle.com Mon Mar 10 20:14:47 2025 From: alan.bateman at oracle.com (Alan Bateman) Date: Mon, 10 Mar 2025 20:14:47 +0000 Subject: [External] : Re: Verification in agent transformers In-Reply-To: <12AA447F-DFEC-4863-8E16-8D0265EC3CAE@iernst.net> References: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> <4d1ccef8-c837-482b-abb3-72f28593d08a@oracle.com> <37F06AA7-E883-4BCF-8E0E-6B2CF1A81FBD@iernst.net> <12AA447F-DFEC-4863-8E16-8D0265EC3CAE@iernst.net> Message-ID: On 10/03/2025 18:25, Ryan Ernst wrote: > I created a reproduction: > > verify-error-repro.png > rjernst/verify-error-repro > > github.com > > > > > Again, the VerifyError is correct, it?s what we expect (we created bad > bytecode in a transform), but it doesn?t always occur. > Classes loaded from modules mapped to the boot loader, or classes on the boot loader's class path, are not verified if modified at class load time. They are verified if redefined at runtime. Developers of agents are not infallible so there may be an argument to enable BytecodeVerificationLocal when an agent enables one of the can_generate_XXX_class_hook_events capabilities. -Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: verify-error-repro.png Type: image/png Size: 120561 bytes Desc: not available URL: From coleen.phillimore at oracle.com Mon Mar 10 20:44:42 2025 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 10 Mar 2025 16:44:42 -0400 Subject: [External] : Re: Verification in agent transformers In-Reply-To: References: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> <4d1ccef8-c837-482b-abb3-72f28593d08a@oracle.com> <37F06AA7-E883-4BCF-8E0E-6B2CF1A81FBD@iernst.net> <12AA447F-DFEC-4863-8E16-8D0265EC3CAE@iernst.net> Message-ID: On 3/10/25 4:14 PM, Alan Bateman wrote: > On 10/03/2025 18:25, Ryan Ernst wrote: >> I created a reproduction: >> >> verify-error-repro.png >> rjernst/verify-error-repro >> >> github.com >> >> >> >> >> Again, the VerifyError is correct, it?s what we expect (we created >> bad bytecode in a transform), but it doesn?t always occur. >> > Classes loaded from modules mapped to the boot loader, or classes on > the boot loader's class path, are not verified if modified at class > load time. They are verified if redefined at runtime. Developers of > agents are not infallible so there may be an argument to enable > BytecodeVerificationLocal when an agent enables one of the > can_generate_XXX_class_hook_events capabilities. Yes, I just checked the code and we don't verify classes loaded via CFLH and we should fix that. Coleen > > -Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: verify-error-repro.png Type: image/png Size: 120561 bytes Desc: not available URL: From ascarpino at openjdk.org Mon Mar 10 22:51:58 2025 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Mon, 10 Mar 2025 22:51:58 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 23:03:23 GMT, Volodymyr Paprotski wrote: >> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) >> >> Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. >> >> Before (no AVX512) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s >> >> After (with AVX2) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s >> >> >> Before (with AVX512): >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 o... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > more comment improvements test/jdk/com/sun/security/util/math/intpoly/MontgomeryPolynomialFuzzTest.java line 30: > 28: import sun.security.util.math.intpoly.*; > 29: > 30: /* It is strange that there are two copies of the `@test` block. Can you please remove one of them, unless you are seeing a difference that I do not ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1988122873 From eastig at amazon.co.uk Mon Mar 10 22:55:35 2025 From: eastig at amazon.co.uk (Astigeevich, Evgeny) Date: Mon, 10 Mar 2025 22:55:35 +0000 Subject: [External] : RFD: Grouping hot code in CodeCache In-Reply-To: <681195aa-50a9-45ad-abe5-6e6e2d164b01@oracle.com> References: <1B0C3138-761B-4DB0-8A98-977C6FC40178@amazon.co.uk> <2623f909-bb91-4450-bc05-d9181ba3abcb@oracle.com> <681195aa-50a9-45ad-abe5-6e6e2d164b01@oracle.com> Message-ID: <921F549A-921F-4F3D-A481-43D0F2F25183@amazon.co.uk> Hi Vladimir, > I don't like manual part of this - providing list of hot methods which > should be collocated. It looks like I was not clear in my first email and miscommunication happened. I am sorry. I provided it to share what we tried and what lessons we learned, especially how it is complicated. We have no intent to upstream list-based solutions. > Sometime ago we had concept of Code Thank for sharing. If I remember correctly it uses deoptimization to remove aging code which means recompilation. BTW, I found https://openjdk.org/jeps/8350338 " Cooperative JFR Sampling". I see it has things we want in our implementation. Thanks, Evgeny ?On 08/03/2025, 02:03, "Vladimir Kozlov" > wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On 3/7/25 3:47 PM, Astigeevich, Evgeny wrote: > Hi Vladimir, > > Thank you for the feedback. > >> My concern is that it will complicate VM existing code for not >> significant benefits in real production environment. To clarify. I don't like manual part of this - providing list of hot methods which should be collocated. I am fine to have special segment for special C2 compiled code. We will have one for some AOT code in Leyden. Move code in CodeCache to make it more dense is also fine. > > I think it won't complicate the existing code: > - Adding a code heap is ~50 lines of code, mostly in CodeCache::initialize_heaps. > - Relocating nmethods, according to PR[1], is ~300 lines of code. > - A grouping thread is simple and isolated. It will go through Java threads checking their last frame(s) and recording seen nmethods. It should have less code than the sweeper which was ~700 lines. > > I think it's better to wait for PoC to see the complexity. > >> What improvements your experiments in real production runs shows? And >> which JDK version you used for that? > > In production we are using internally 17 (static lists of methods) and 21 (dynamic lists of methods). > Improvements are in a range of 5% - 15%. They depend on how big CPU load is: the more CPU load the bigger improvement. Good. > >> As you know most of nmethod's metadata is moved from CodeCache. >> ... >> After that the code will be a lot more compact in CodeCache. Code sparsity >> should be less issue then. > > Yes, removing non-code from nmethod will improve code density. This means in a code region we will have more code vs non-code. > CPU instruction caches will like this. > > As I wrote in a comment to benchmark PR [2], Neoverse operates in code regions. For Neoverse it's more important to have as less code regions with active nmethods as possible. > > We are aware of cases when CodeCache usage is between 512M - 1G. The mentioned changes won't help in those cases. > If I remember, no public benchmarks have demonstrated improvements from non-code moved away from nmethod. > > Since the removal of the Sweeper, GC is in charge of cleaning CodeCache. We've seen cases when GC was triggered often because of allocation pressure on CodeCache. > For such cases, a recommended workaround is to increase the size of CodeCache from default 240M up to 512M. In such cases actively used nmethods will more likely be sparse. Hmm, may be we should restore counters decay for this case to prevent warm methods from compiling and polluting CodeCache and keep it small. > >> It would be nice if you redo your production experiments after that. > > Due to the complexity of customer's application we cannot run it on OpenJDKTip. It has thousand dependencies. We will need to move them on OpenJDKTip. > I think it would be difficult to backport the mentioned changes to 21 Understood. > >> I understand that we can still have sparsity due to "warm" nmethods and >> C1 compiled code mixed with "hot" C2 nmethods. > > Customers having issues with big CodeCache on Graviton usually turn off tiered compilation to reduce far jumps/calls. BTW, this is another argument for identifying active nmethods and grouping them together: it should reduce/eliminate far jumps/calls. > With small CodeCache mix of C1 and C2 nmethods is not an issue. > >> Can we simply use a separate CodeCache's segment for all >> C2 "hot" (we can specify frequency flag to determine what "hot" means) >> methods regardless when they are compiled. > > I did not get the idea. We already have the non-profiled segment where C2 code is put. Do you mean that at the compilation time some methods are put in the regular non-profile segment and some in the specific non-profile segment? Yes, I meant separate segments for hot and warm methods, both are c2 compiled code. It would still mix all 3 cases you pointed because compilation policy based mostly on what happened during startup. So it may be not good idea. > What we've seen that methods profiles keep changing. > There are the following cases: > 1. C2 methods used most of the time: their profile can stay the same or can get hotter. > 2. C2 methods used periodically: actively used, not used, actively used and so on > 3. C2 methods used actively during some time and never used after > > Currently GC identifies cases #3 and some cases #2, aka cold code. The percentage of methods case #1 is ~10% - 20%. > If we have 100M of C2 code, only 10M - 20M will be actively used. If we get unlucky, those 10M-20M could be spread across CodeCache and cause CPU stalls. > How can we identify those 10%-20% of methods at compilation time? I agree that it will be hard to determine that during compilation. We need some statistic after we compiled to find such methods. Sometime ago we had concept of Code Aging (removed after Sweeper was removed): https://github.com/vnkozlov/jdk17u-dev/commit/54db2c2d612c573f91f69b7b387b43a8e1c9d563 It added counter on nmethod entry to keep track if it is alive. We can use something similar to track how frequently nmethod is used. Erik Osterlund also had prototype in Leyden for call stack profiling by VM itself to find most used hot methods during training run. Thanks, Vladimir. > > BTW, I think the separate hot code heap might simplify flushing cold code. Everything not in the hot code heap can automatically assumed cold. > > Thanks, > Evgeny > > [1]: https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/23573__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBh_vDxrlg$ > [2]: https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/23831 *issuecomment-2705085399__;Iw!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhJyHApiM$ > > On 06/03/2025, 22:41, "Vladimir Kozlov" >> wrote: > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > > > > Hi Evgeny, > > > My concern is that it will complicate VM existing code for not > significant benefits in real production environment. > > > What improvements your experiments in real production runs shows? And > which JDK version you used for that? > > > As you know most of nmethod's metadata is moved from CodeCache. And > Boris Ulasevich will move the final part (relocation info) soon. After > that the code will be a lot more compact in CodeCache. Code sparsity > should be less issue then. > > > It would be nice if you redo your production experiments after that. > > > I understand that we can still have sparsity due to "warm" nmethods and > C1 compiled code mixed with "hot" C2 nmethods. I think compilation > policy has heuristic to detect "warm" method (time intervals between > invocations). Can we simply use a separate CodeCache's segment for all > C2 "hot" (we can specify frequency flag to determine what "hot" means) > methods regardless when they are compiled. Then you don't need to create > list or do anything special for them. Most likely we will waste more > space in CodeCache but it could be conditional under flag which you > already proposed in separate segment RFE. > > > Thanks, > Vladimir K > > > On 3/5/25 10:41 AM, Astigeevich, Evgeny wrote: >> Hi Vladimir, >> >> This is JDK-8326205: Implement grouping hot nmethods in CodeCache. >>> As I managed to synthesize a benchmark > (https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$ > >> pull/23831 >> pull/23831__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- >> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- >> imnfmmpfw$>) to demonstrate performance impact of sparse code, I?d like >> to discuss a possible solution of the sparse code. >> >> High level, a solution is: >> >> * Detect hot code. >> * Group hot code. >> * Maintain grouped code. >> >> Downstream we tried two approaches: >> >> * *Static lists of methods (compile command):* Identify frequently >> used (hot) methods using test runs and provide static method lists >> to JVM in production. When JVM compiles a Java method and the method >> is on the list, JVM puts the code into to a designated code heap >> (HotCodeHeap). >> * *Dynamic lists of methods (compiler directives):* Profile an >> application in production and dynamically relocate identified hot >> methods to HotCodeHeap. Relocation was implemented with recompilation. >> >> The main advantage of static lists is zero profiling overhead in >> production. We do all profiling and analysis in test runs. Its problems are: >> >> * *Training Run Accuracy*: We need training runs to have execution >> paths closely mimicking production environments. Otherwise we put >> wrong methods into HotCodeHeap. >> * *Method List Maintenance:* We need to rerun training to regenerate >> lists when application code changes. Training runs are expensive and >> time-consuming. They require long runs to guarantee we see all major >> execution paths. Updating lists in production can be as complex as >> application deployment >> * *Method Placement Limitations:* Methods marked for HotCodeHeap are >> permanently placed into HotCodeHeap. No mechanism to remove methods >> that become less frequently used. >> >> We addressed these problems with dynamic lists of methods. We >> implemented a Java agent that runs within the same JVM to dynamically >> detect and manage hot Java methods without prior method identification. >> The agent detects hot methods using JFR. The agent manages hot Java >> methods in HotCodeHeap with compiler directives. A new compiler >> directive marks methods with dynamic states ("hot" or "cold"). Methods >> marked by the ?hot? state are recompiled and placed in HotCodeHeap. >> Methods marked by the ?cold? state are eventually removed from HotCodeHeap. >> >> Problems of this approach are: >> >> * It requires specific, complex modifications to compiler directive >> support: recompilation of Java methods affected by compiler >> directives changes. This functionality is unique to Java agent >> implementation and has limited potential for broader use. >> * The agent cannot guarantee Java methods are moved to/removed from >> the HotCodeHeap because updates of compiler directives can fail. >> * The agent knows nothing about compiled code, e.g. whether it?s C1 or >> C2 compiled, code size, profile. This data can useful for deciding >> to move or not to move to HotCodeHeap. >> * Recompilations, especially C2, are expensive. Having many of them >> can cause performance issues. Also recompiled code might differ from >> the code we have detected as ?hot?. >> >> Running these two approaches in production we learned: >> >> * We detect 95% of actively used methods withing the first 30 minutes >> of an application run. This is with JFR profiling configured: 90 >> seconds session duration, sampling each 11 ms, 8 minutes between >> profiling sessions. We can find actively used methods faster if we >> reduce a pause between profiling sessions and sampling period. >> However it will increase the profiling overhead and affect >> application performance. With the current configuration, the >> profiling overhead is between 1% - 2%. >> * A set of actively used methods gets into the steady state (no new >> methods added to, no methods removed from) within the first 60 minutes. >> * Static lists, when created from runs close to production, have 80% - >> 90% methods always in use. This does not change over time. >> * Predicting the size of HotCodeHeap is difficult, especially with >> dynamic lists. >> >> We want to have grouping of hot method functionality as a part Hotspot >> JVM. We will group only C2 compiled methods. We can group JVMCI compiled >> methods, e.g. Graal, if needed. We need profiling precise enough to >> detect major Java methods. Low overhead is more important than precision. >> >> We think we can have a solution which does not require a lot of code: >> >> * Detect hot code: we can an implementation based on the Sweeper: >> https://urldefense.com/v3/__https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhMWPimyo$ > >> runtime/sweeper.hpp >> openjdk/jdk17u/blob/master/src/hotspot/share/runtime/ >> sweeper.hpp__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- >> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- >> imVr_axpo$>. We will use the handshakes mechanism, what the Sweeper >> used, to detect nmethods on the top of thread stacks. >> * Group hot code: we have a draft PR https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$ > >> pull/23573 >> jdk/pull/23573__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- >> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- >> imcL9xtiE$>. It implements relocation of nmethods within CodeCache. >> * Maintain grouped code: we will add an additional code heap where hot >> nmethods will be relocated to. >> >> What do you think about this approach? Are there other possible solutions? >> >> Thanks, >> >> Evgeny A. >> >> >> >> >> Amazon Development Centre (London) Ltd.Registered in England and Wales >> with registration number 04543232 with its registered office at 1 >> Principal Place, Worship Street, London EC2A 2FA, United Kingdom. >> >> > > > > > > > > > Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. > > Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. From vpaprotski at openjdk.org Mon Mar 10 23:10:54 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 10 Mar 2025 23:10:54 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 22:49:06 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> more comment improvements > > test/jdk/com/sun/security/util/math/intpoly/MontgomeryPolynomialFuzzTest.java line 30: > >> 28: import sun.security.util.math.intpoly.*; >> 29: >> 30: /* > > It is strange that there are two copies of the `@test` block. Can you please remove one of them, unless you are seeing a difference that I do not -XX:+/-UseIntPolyIntrinsics (test Java vs BigInt and intrinsic vs BigInt) Though I think I did this before I knew much about junit.. I think I can just have two @run commands (to make it clearer)? Will give that a try > test/jdk/com/sun/security/util/math/intpoly/MontgomeryPolynomialFuzzTest.java line 123: > >> 121: } >> 122: >> 123: if (rnd.nextBoolean()) { > > Why is this done randomly? Wouldn't we want to check these situations every time? I was mostly attempting to test 'random paths' through the code, and this was a way to pseudo-randomly accomplish that. (i.e. a product of a difference, a product of a product.. and so on..) Since this is looping, we got 50% chance of getting both, without me having to write/think-through all the many permutations of what input/outputs to each operations can be. (Extend the loop count to run for several hours during development.. and it does wonders to testing corner cases. Have been following this 'template' in most my PRs) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1988136095 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1988134465 From fyang at openjdk.org Tue Mar 11 00:51:03 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 11 Mar 2025 00:51:03 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v5] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Mon, 10 Mar 2025 17:21:47 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/riscv.ad line 4975: >> >>> 4973: >>> 4974: ins_pipe(fp_load_constant_s); >>> 4975: %} >> >> Can we put this two after `loadConD0`? Then we have a more consistent order for the three variants (F->D->FH) at each place. > > In fact, in my mind the order should be FH->F->D, considering the width changing from 16->32->64. That's also fine to me. Just expecting a consistent order for other definitions as well like `immHF0` -> `immF0` -> `immD0`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1988200318 From fyang at openjdk.org Tue Mar 11 01:03:04 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 11 Mar 2025 01:03:04 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v5] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Mon, 10 Mar 2025 17:23:15 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/vm_version_riscv.hpp line 301: >> >>> 299: static bool supports_fencei_barrier() { return ext_Zifencei.enabled(); } >>> 300: >>> 301: static bool supports_float16_float_conversion() { >> >> Is it safe to simply rename this as `supports_float16` like other CPUs? I suppose it will still work in functionality even if we only have the `Zfhmin` extension, right? > > In fact, I think `supports_float16` is confusing, in particular on riscv, as e.g. `AddHF` requires `UseZfh`, but `ConvF2HF` requires `UseZfh || UseZfhmin`, so I think it's better to keep supports_float16_float_conversion, as its naming is much more explicit and clearer. OK! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1988207520 From fyang at openjdk.org Tue Mar 11 01:14:52 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 11 Mar 2025 01:14:52 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v5] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Mon, 10 Mar 2025 17:22:06 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/riscv.ad line 8324: >> >>> 8322: %} >>> 8323: >>> 8324: instruct min_max_HF_reg(fRegF dst, fRegF src1, fRegF src2) >> >> It will be easier for people to map to exitsting F/D variants if we use similar names like `sqrtHF_reg`, `minHF_reg_reg`, `maxHF_reg_reg`, `maddHF_reg_reg`, and `binOpsHF_reg_reg` for the the HF variant. > > I think there is no convention of instruct naming, as long as there is no duplicate names, and normally we don't care about the instruct name too much. What really matters is the match rule, e.g. MinHF/ManHF and so on, which indicates what the instructs do. Yes, it won't affect functionality here. The instruct naming matters when we want some fine-grained tunnings (minI_reg_reg/minI_reg_zero). That said, it doesn't seem that it could happen for this HF part? So OK! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1988215397 From fyang at openjdk.org Tue Mar 11 03:25:55 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 11 Mar 2025 03:25:55 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v15] In-Reply-To: References: Message-ID: On Sat, 8 Mar 2025 19:32:54 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. > Cause are last-minute changes before making the PR ready to review. > > Testing: without the patch, occurs fairly frequently when continuously > (1 in 20) starting refinement. Does not afterward. > - * ayang review 3 > * comments > * minor refactorings Tier1-3 test good on linux-riscv64 platform. And I have prepared an add-on change which implements the barrier method to write cards for a reference array for this platform. Do you want to have it in this PR? Thanks. [23739-riscv-addon.txt](https://github.com/user-attachments/files/19174898/23739-riscv-addon.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2712469306 From ayang at openjdk.org Tue Mar 11 09:23:01 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 11 Mar 2025 09:23:01 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v9] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 21:26:08 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > removed template paramter and moved ptr can_align_up > Do we really want to allow passing nullptr to can_align_up(void* ptr, A alignment)? I don't see any problem with allowing or passing `nullptr` to `can_align_up`, `align_up`, or `align_down`. The result should be `true` and have no effect, as if the argument were the integer `0`, right? Disallowing `nullptr` would introduce extra code in these functions, which would clutter the flow, in my opinion. src/hotspot/share/utilities/align.hpp line 83: > 81: constexpr T align_up(T size, A alignment) { > 82: T mask = checked_cast(alignment_mask(alignment)); > 83: assert(size <= std::numeric_limits::max() - mask, "overflow"); Just curious, if `can_align_up` is precondition, why not `assert(can_align_up(...), "precondition")`? Then, the comment can even be dropped. ------------- PR Review: https://git.openjdk.org/jdk/pull/23711#pullrequestreview-2673529828 PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1988763609 From mli at openjdk.org Tue Mar 11 09:33:57 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Mar 2025 09:33:57 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v5] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Tue, 11 Mar 2025 00:48:04 GMT, Fei Yang wrote: >> In fact, in my mind the order should be FH->F->D, considering the width changing from 16->32->64. > > That's also fine to me. Just expecting a consistent order for other definitions as well like `immHF0` -> `immF0` -> `immD0` (and `immHF` -> `immF` -> `immD`). The orders in original code are inconsistent, e.g. for `operand` it's immD -> immF, for `instruct` it's loadConF -> loadConD, this patch just follow the previous local order. And I think this inconsistency is not a problem, because when we read the code, as long as it maintains a certain order locally, then there is no problem, we don't have to enforce global consistency because it is difficult to do so and there doesn't seem to be much benefit. > That said, it doesn't seem that it could happen for this HF part? No, seems not for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1988795670 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1988795542 From tschatzl at openjdk.org Tue Mar 11 09:51:53 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 11 Mar 2025 09:51:53 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v16] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/93b884f1..758fac01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=14-15 Stats: 36 lines in 1 file changed: 28 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Tue Mar 11 09:54:05 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 11 Mar 2025 09:54:05 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v15] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 03:22:52 GMT, Fei Yang wrote: > Tier1-3 test good on linux-riscv64 platform. And I have prepared an add-on change which implements the barrier method to write cards for a reference array for this platform. Do you want to have it in this PR? Thanks. I added your changes, thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2713415911 From fyang at openjdk.org Tue Mar 11 11:05:59 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 11 Mar 2025 11:05:59 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v5] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Fri, 7 Mar 2025 11:42:34 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. >> >> ## Performance >> >> data >> >> Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 >> Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 >> Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 >> Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 >> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 >> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 >> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 >> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.81... > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - clean > - renaming Overall LGTM. Thanks for the updates. src/hotspot/cpu/riscv/riscv.ad line 8346: > 8344: instruct fma_HF_reg(fRegF dst, fRegF src1, fRegF src2) > 8345: %{ > 8346: match(Set dst (FmaHF src2 (Binary dst src1))); Question: Why is `dst` used as one of the source operands at the same time? There doesn't seem to be such a constraint at the instruction level. I didn't see similar constraint for F/D variants `maddF_reg_reg` and `maddD_reg_reg`. So it's likely to be relaxed. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23844#pullrequestreview-2673944224 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1988982034 From shade at openjdk.org Tue Mar 11 11:41:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 11 Mar 2025 11:41:28 GMT Subject: RFR: 8351640: Print reason for making method not entrant Message-ID: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> A simple quality of life improvement. We are studying compiler dynamics in Leyden, and it would be convenient to know why the particular methods are marked as not entrant. We just need to pass the extra string argument to `nmethod::make_not_entrant` and print it out. Sample log excerpt for mainline: $ grep com.sun.tools.javac.util.IntHashTable::lookup print-compilation.log 987 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) 1019 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) 1024 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used 4995 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: uncommon trap 5287 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) 6615 5472 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) 6626 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used You can now clearly see the method lifecycle. 1 second in app lifetime, the method was initially compiled at level 3. Shortly after, it got compiled at level 4, turning level 3 method unused. 4 seconds later, level 4 method encountered uncommon trap, so we are back to level 3. After 1.3 seconds more, the final compilation at level 4 completed, and second level 3 compilation was removed as unused. Additional testing: - [x] Linux x86_64 server fastdebug, `hotspot:tier1` ------------- Commit messages: - Use resource allocation for temp buffer - Base version Changes: https://git.openjdk.org/jdk/pull/23980/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23980&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351640 Stats: 36 lines in 14 files changed: 8 ins; 0 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/23980.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23980/head:pull/23980 PR: https://git.openjdk.org/jdk/pull/23980 From ayang at openjdk.org Tue Mar 11 11:59:00 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 11 Mar 2025 11:59:00 GMT Subject: RFR: 8321529: log_on_large_pages_failure reports log_debug(gc, heap, coops) for ReservedCodeSpace failures [v2] In-Reply-To: <3rvoe3fw-Qv2tyGwomaDxy2hNzfoaZuj4wUdgQzi5hM=.f8522d94-df2b-44a0-8a7f-503e7b19ce71@github.com> References: <3rvoe3fw-Qv2tyGwomaDxy2hNzfoaZuj4wUdgQzi5hM=.f8522d94-df2b-44a0-8a7f-503e7b19ce71@github.com> Message-ID: On Mon, 27 Jan 2025 18:19:04 GMT, Stefan Karlsson wrote: >> The code path that we use to reserve memory is generic and used by various paths in the JVM, but we log messages about failures to reserve large pages on the 'gc, heap, coops' tag set. This is confusing, so I propose to log this on 'os, map' instead. We already use that tag set to log memory reservation, so I think that's a decent tag set to use. >> >> While doing this change I also added some extra info about the area that we tried to reserve and commit. >> >> A couple of G1 tests had to be tweaked. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Update memoryReserver.cpp The more generic tag makes sense -- `log_on_large_pages_failure` is too low-level to contain the component-specific info, IMO. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23297#pullrequestreview-2674153208 From mli at openjdk.org Tue Mar 11 12:43:30 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Mar 2025 12:43:30 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v6] In-Reply-To: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: <_YhLkn3fphUOBhl2tyMmEEnk282U5nzSJZeez0sKtXc=.5c0fdac0-7c49-4567-8861-6b5b03de226f@github.com> > Hi, > Can you help to review this patch? > It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. > > ## Performance > > data > > Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 > Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 > Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 > Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 > Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 > Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 > Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 > Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.818 | 250.761 | 5191.12 | 64.598 | ns/op | 17.639 > Fl... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: refine ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23844/files - new: https://git.openjdk.org/jdk/pull/23844/files/143869ff..e63061d2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23844&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23844&range=04-05 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23844/head:pull/23844 PR: https://git.openjdk.org/jdk/pull/23844 From mli at openjdk.org Tue Mar 11 12:43:30 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Mar 2025 12:43:30 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v5] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: <7GblVf7CFMmI2B0kt_P23mUR_5backyLTEy6SKvDjE8=.5a39d343-1223-41a0-a8dc-2a67ddf856e2@github.com> On Tue, 11 Mar 2025 11:00:55 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - clean >> - renaming > > src/hotspot/cpu/riscv/riscv.ad line 8346: > >> 8344: instruct fma_HF_reg(fRegF dst, fRegF src1, fRegF src2) >> 8345: %{ >> 8346: match(Set dst (FmaHF src2 (Binary dst src1))); > > Question: Why is `dst` used as one of the source operands at the same time? There doesn't seem to be such a constraint at the instruction level. I didn't see similar constraint for F/D variants `maddF_reg_reg` and `maddD_reg_reg`. So it's likely to be relaxed. Yes, it make senses to me. Relaxed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1989169152 From stefank at openjdk.org Tue Mar 11 12:48:00 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 11 Mar 2025 12:48:00 GMT Subject: RFR: 8321529: log_on_large_pages_failure reports log_debug(gc, heap, coops) for ReservedCodeSpace failures [v2] In-Reply-To: References: <3rvoe3fw-Qv2tyGwomaDxy2hNzfoaZuj4wUdgQzi5hM=.f8522d94-df2b-44a0-8a7f-503e7b19ce71@github.com> Message-ID: On Tue, 11 Mar 2025 11:56:20 GMT, Albert Mingkun Yang wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Update memoryReserver.cpp > > The more generic tag makes sense -- `log_on_large_pages_failure` is too low-level to contain the component-specific info, IMO. Thanks @albertnetymk. I agree. Now that this PR has been out for a while, is anyone opposing this the suggested change, or should I go ahead and push it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23297#issuecomment-2714059879 From mli at openjdk.org Tue Mar 11 12:53:57 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Mar 2025 12:53:57 GMT Subject: Integrated: 8351145: RISC-V: only enable some crypto intrinsic when AvoidUnalignedAccess == false In-Reply-To: <8-3lLYr9jtNOQJhRLRyAA2xxfxG2aVm27HIGcbNsCfY=.8e43a66a-5602-4345-b5a0-cfdaab7e0d8f@github.com> References: <8-3lLYr9jtNOQJhRLRyAA2xxfxG2aVm27HIGcbNsCfY=.8e43a66a-5602-4345-b5a0-cfdaab7e0d8f@github.com> Message-ID: On Tue, 4 Mar 2025 16:11:41 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > > Depending whether a cpu supports fast misaligned access or not, the misaligned access can impact the performance a lot. > Some crypto intrinsic implementation on riscv do not consider data alignment and just use `ld` to load input byte array, and seems there is no way to do it, the main reason is that at java API level, the input byte array to these JVM intrinsic could be part of a real java array, so the input byte array could be 1/2...7 byte aligned. > And with the introduction of COH, it would be even complicated to do the input data alignment. > > So, for the consistency of performance, seems it's better to disable these intrinsics when AvoidUnalignedAccess == true. > And the user can still enable the intrinsics explicitly on a CPU with AvoidUnalignedAccess == true if they want so. > > Thanks! This pull request has now been integrated. Changeset: af9af7e9 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/af9af7e90f7dab5adc7b89b76eb978d269e863de Stats: 10 lines in 1 file changed: 4 ins; 4 del; 2 mod 8351145: RISC-V: only enable some crypto intrinsic when AvoidUnalignedAccess == false Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/23903 From fyang at openjdk.org Tue Mar 11 12:59:59 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 11 Mar 2025 12:59:59 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v6] In-Reply-To: <_YhLkn3fphUOBhl2tyMmEEnk282U5nzSJZeez0sKtXc=.5c0fdac0-7c49-4567-8861-6b5b03de226f@github.com> References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> <_YhLkn3fphUOBhl2tyMmEEnk282U5nzSJZeez0sKtXc=.5c0fdac0-7c49-4567-8861-6b5b03de226f@github.com> Message-ID: On Tue, 11 Mar 2025 12:43:30 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. >> >> ## Performance >> >> data >> >> Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 >> Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 >> Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 >> Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 >> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 >> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 >> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 >> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.81... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > refine Still good to me. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23844#pullrequestreview-2674395141 From mli at openjdk.org Tue Mar 11 13:03:04 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Mar 2025 13:03:04 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v6] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> <_YhLkn3fphUOBhl2tyMmEEnk282U5nzSJZeez0sKtXc=.5c0fdac0-7c49-4567-8861-6b5b03de226f@github.com> Message-ID: On Tue, 11 Mar 2025 12:56:54 GMT, Fei Yang wrote: > Still good to me. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23844#issuecomment-2714131266 From eastigeevich at openjdk.org Tue Mar 11 13:42:00 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 11 Mar 2025 13:42:00 GMT Subject: RFR: 8321529: log_on_large_pages_failure reports log_debug(gc, heap, coops) for ReservedCodeSpace failures [v2] In-Reply-To: <3rvoe3fw-Qv2tyGwomaDxy2hNzfoaZuj4wUdgQzi5hM=.f8522d94-df2b-44a0-8a7f-503e7b19ce71@github.com> References: <3rvoe3fw-Qv2tyGwomaDxy2hNzfoaZuj4wUdgQzi5hM=.f8522d94-df2b-44a0-8a7f-503e7b19ce71@github.com> Message-ID: On Mon, 27 Jan 2025 18:19:04 GMT, Stefan Karlsson wrote: >> The code path that we use to reserve memory is generic and used by various paths in the JVM, but we log messages about failures to reserve large pages on the 'gc, heap, coops' tag set. This is confusing, so I propose to log this on 'os, map' instead. We already use that tag set to log memory reservation, so I think that's a decent tag set to use. >> >> While doing this change I also added some extra info about the area that we tried to reserve and commit. >> >> A couple of G1 tests had to be tweaked. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Update memoryReserver.cpp lgtm ------------- PR Comment: https://git.openjdk.org/jdk/pull/23297#issuecomment-2714313533 From duke at openjdk.org Tue Mar 11 14:44:04 2025 From: duke at openjdk.org (David Linus Briemann) Date: Tue, 11 Mar 2025 14:44:04 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms [v2] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 19:39:23 GMT, Leonid Mesnik wrote: >> David Linus Briemann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: >> >> - remove CountBytecodesTest from tier1 >> - Merge branch 'master' into dlb/bytecode_counter_overflow >> - remove auto included header >> - fix x86 asm >> - address review comment, add back comma to copyright header >> - formatting >> - remove bad header >> - add missing comma to copyright header >> - speed up runtime by running less bytecodes, add explanation >> - add copyright header and @bug number >> - ... and 5 more: https://git.openjdk.org/jdk/compare/057e808f...31a52156 > > test/hotspot/jtreg/runtime/interpreter/CountBytecodesTest.java line 32: > >> 30: * does not overflow for more than 2^32 bytecodes counted. >> 31: * @library /test/lib >> 32: * @run main/othervm/timeout=300 CountBytecodesTest > > The long tests should be excluded from tier1. Please update TEST.groups. I excluded the test from the `tier1_runtime` tests. To my understanding it should now run in tier4. Could you please verify? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23766#discussion_r1989450498 From shade at openjdk.org Tue Mar 11 15:02:15 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 11 Mar 2025 15:02:15 GMT Subject: RFR: 8351656: Problemlist gc/TestAllocHumongousFragment#generational Message-ID: Causes noise in GHA testing, so we need to problemlist it. Additional testing: - [x] Checked the test is skipped locally ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/23982/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23982&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351656 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23982.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23982/head:pull/23982 PR: https://git.openjdk.org/jdk/pull/23982 From shade at openjdk.org Tue Mar 11 15:40:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 11 Mar 2025 15:40:56 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms [v2] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 14:41:42 GMT, David Linus Briemann wrote: >> test/hotspot/jtreg/runtime/interpreter/CountBytecodesTest.java line 32: >> >>> 30: * does not overflow for more than 2^32 bytecodes counted. >>> 31: * @library /test/lib >>> 32: * @run main/othervm/timeout=300 CountBytecodesTest >> >> The long tests should be excluded from tier1. Please update TEST.groups. > > I excluded the test from the `tier1_runtime` tests. To my understanding it should now run in tier4. Could you please verify? Thanks. Yes, this should work. In current definition, `tier4` is "catch-all" group that handles all the tests that are not explicitly in `tier{1,2,3}`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23766#discussion_r1989577367 From mdoerr at openjdk.org Tue Mar 11 15:44:59 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 11 Mar 2025 15:44:59 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 10:24:44 GMT, David Linus Briemann wrote: >> 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms > > David Linus Briemann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - remove CountBytecodesTest from tier1 > - Merge branch 'master' into dlb/bytecode_counter_overflow > - remove auto included header > - fix x86 asm > - address review comment, add back comma to copyright header > - formatting > - remove bad header > - add missing comma to copyright header > - speed up runtime by running less bytecodes, add explanation > - add copyright header and @bug number > - ... and 5 more: https://git.openjdk.org/jdk/compare/d09a328b...31a52156 LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23766#pullrequestreview-2675120224 From shade at openjdk.org Tue Mar 11 15:45:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 11 Mar 2025 15:45:00 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms [v2] In-Reply-To: References: Message-ID: <6MRVgthMpv2SPYeh7wdCwzoirw0mhOMHKfL25kZGG_w=.2754f881-ab03-4733-9299-ccfa5fa5f44f@github.com> On Mon, 10 Mar 2025 10:24:44 GMT, David Linus Briemann wrote: >> 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms > > David Linus Briemann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - remove CountBytecodesTest from tier1 > - Merge branch 'master' into dlb/bytecode_counter_overflow > - remove auto included header > - fix x86 asm > - address review comment, add back comma to copyright header > - formatting > - remove bad header > - add missing comma to copyright header > - speed up runtime by running less bytecodes, add explanation > - add copyright header and @bug number > - ... and 5 more: https://git.openjdk.org/jdk/compare/d09a328b...31a52156 This looks fine to me, with a few nits, thanks. x86_32 parts would go away as we cleanup after x86_32 removal, but they can stay here for completeness and backportability. src/hotspot/share/interpreter/bytecodeTracer.cpp line 132: > 130: st->print("[%zu] ", Thread::current()->osthread()->thread_id_for_printing()); > 131: if (Verbose) { > 132: st->print("%8zu %4d " INTPTR_FORMAT " " INTPTR_FORMAT " %s", Sounds like there are more than 8 digits now? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23766#pullrequestreview-2675109785 PR Review Comment: https://git.openjdk.org/jdk/pull/23766#discussion_r1989582658 From cnorrbin at openjdk.org Tue Mar 11 15:55:00 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 11 Mar 2025 15:55:00 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v10] In-Reply-To: References: Message-ID: > Hi everyone, > > The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. > > The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: changed assert in align_up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23711/files - new: https://git.openjdk.org/jdk/pull/23711/files/0933d3c9..5dc102ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=08-09 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23711.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23711/head:pull/23711 PR: https://git.openjdk.org/jdk/pull/23711 From cnorrbin at openjdk.org Tue Mar 11 15:55:02 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 11 Mar 2025 15:55:02 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v9] In-Reply-To: References: Message-ID: <5SPbr_ermCom_pIr1s96QKoFx9D-xzS4xLNpBSukBwE=.faa595ba-32bf-44fc-b338-289577e7d9f0@github.com> On Tue, 11 Mar 2025 09:14:35 GMT, Albert Mingkun Yang wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> removed template paramter and moved ptr can_align_up > > src/hotspot/share/utilities/align.hpp line 83: > >> 81: constexpr T align_up(T size, A alignment) { >> 82: T mask = checked_cast(alignment_mask(alignment)); >> 83: assert(size <= std::numeric_limits::max() - mask, "overflow"); > > Just curious, if `can_align_up` is precondition, why not `assert(can_align_up(...), "precondition")`? Then, the comment can even be dropped. That is probably a cleaner way to do it now that we have `can_align_up`. I swapped the assert, and changed the rest of the function to look like before. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1989615709 From cnorrbin at openjdk.org Tue Mar 11 15:56:58 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 11 Mar 2025 15:56:58 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v9] In-Reply-To: References: Message-ID: <92G7Ypw_wM0WQkJQbGJsEJ9eecHtL5rTy_6Fx2XoNhA=.e7da04dd-9b9d-4860-a53a-626715124417@github.com> On Wed, 5 Mar 2025 21:26:08 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > removed template paramter and moved ptr can_align_up I think it's fine to allow `nullptr`s for the alignment functions. As Albert said, the result should be `true` anyways. If the caller wants a `nullptr` check, they can add it before aligning. > > Do we really want to allow passing nullptr to can_align_up(void* ptr, A alignment)? > > I don't see any problem with allowing or passing `nullptr` to `can_align_up`, `align_up`, or `align_down`. The result should be `true` and have no effect, as if the argument were the integer `0`, right? Disallowing `nullptr` would introduce extra code in these functions, which would clutter the flow, in my opinion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23711#issuecomment-2714865755 From lmesnik at openjdk.org Tue Mar 11 15:58:59 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 11 Mar 2025 15:58:59 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 10:24:44 GMT, David Linus Briemann wrote: >> 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms > > David Linus Briemann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - remove CountBytecodesTest from tier1 > - Merge branch 'master' into dlb/bytecode_counter_overflow > - remove auto included header > - fix x86 asm > - address review comment, add back comma to copyright header > - formatting > - remove bad header > - add missing comma to copyright header > - speed up runtime by running less bytecodes, add explanation > - add copyright header and @bug number > - ... and 5 more: https://git.openjdk.org/jdk/compare/f1603a4b...31a52156 Test changes looks good for me. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23766#pullrequestreview-2675200343 From xpeng at openjdk.org Tue Mar 11 16:20:58 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 11 Mar 2025 16:20:58 GMT Subject: RFR: 8351656: Problemlist gc/TestAllocHumongousFragment#generational In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 13:22:27 GMT, Aleksey Shipilev wrote: > Causes noise in GHA testing, so we need to problemlist it. > > Additional testing: > - [x] Checked the test is skipped locally Marked as reviewed by xpeng (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/23982#pullrequestreview-2675284703 From mdoerr at openjdk.org Tue Mar 11 16:25:01 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 11 Mar 2025 16:25:01 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms [v2] In-Reply-To: <6MRVgthMpv2SPYeh7wdCwzoirw0mhOMHKfL25kZGG_w=.2754f881-ab03-4733-9299-ccfa5fa5f44f@github.com> References: <6MRVgthMpv2SPYeh7wdCwzoirw0mhOMHKfL25kZGG_w=.2754f881-ab03-4733-9299-ccfa5fa5f44f@github.com> Message-ID: On Tue, 11 Mar 2025 15:39:17 GMT, Aleksey Shipilev wrote: >> David Linus Briemann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: >> >> - remove CountBytecodesTest from tier1 >> - Merge branch 'master' into dlb/bytecode_counter_overflow >> - remove auto included header >> - fix x86 asm >> - address review comment, add back comma to copyright header >> - formatting >> - remove bad header >> - add missing comma to copyright header >> - speed up runtime by running less bytecodes, add explanation >> - add copyright header and @bug number >> - ... and 5 more: https://git.openjdk.org/jdk/compare/954660f8...31a52156 > > src/hotspot/share/interpreter/bytecodeTracer.cpp line 132: > >> 130: st->print("[%zu] ", Thread::current()->osthread()->thread_id_for_printing()); >> 131: if (Verbose) { >> 132: st->print("%8zu %4d " INTPTR_FORMAT " " INTPTR_FORMAT " %s", > > Sounds like there are more than 8 digits now? I thought about this, too, but I don't think it's a problem because the width is specified like this: "Minimum number of characters to be printed. If the value to be printed is shorter than this number, the result is padded with blank spaces. The value is not truncated even if the result is larger." [https://cplusplus.com/reference/cstdio/printf/]. Do we want a larger fixed number of digits? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23766#discussion_r1989680811 From shade at openjdk.org Tue Mar 11 16:34:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 11 Mar 2025 16:34:01 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms [v2] In-Reply-To: References: <6MRVgthMpv2SPYeh7wdCwzoirw0mhOMHKfL25kZGG_w=.2754f881-ab03-4733-9299-ccfa5fa5f44f@github.com> Message-ID: On Tue, 11 Mar 2025 16:22:19 GMT, Martin Doerr wrote: >> src/hotspot/share/interpreter/bytecodeTracer.cpp line 132: >> >>> 130: st->print("[%zu] ", Thread::current()->osthread()->thread_id_for_printing()); >>> 131: if (Verbose) { >>> 132: st->print("%8zu %4d " INTPTR_FORMAT " " INTPTR_FORMAT " %s", >> >> Sounds like there are more than 8 digits now? > > I thought about this, too, but I don't think it's a problem because the width is specified like this: "Minimum number of characters to be printed. If the value to be printed is shorter than this number, the result is padded with blank spaces. The value is not truncated even if the result is larger." [https://cplusplus.com/reference/cstdio/printf/]. > Do we want a larger fixed number of digits? Yeah, it is not about the correctness. It is more about readability: if we expect more than 8 digits, then the "table" we are printing here would be a bit ragged. UINT64_MAX is about 20 digits. In practice we would probably never do this for longer than 1 hour, and with (ballparking) 100M/sec bytecodes, this gives us a practical upper limit of 12 digits or so? My math might be off a digit or two. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23766#discussion_r1989697851 From shade at openjdk.org Tue Mar 11 16:37:58 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 11 Mar 2025 16:37:58 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms [v2] In-Reply-To: References: <6MRVgthMpv2SPYeh7wdCwzoirw0mhOMHKfL25kZGG_w=.2754f881-ab03-4733-9299-ccfa5fa5f44f@github.com> Message-ID: On Tue, 11 Mar 2025 16:31:20 GMT, Aleksey Shipilev wrote: >> I thought about this, too, but I don't think it's a problem because the width is specified like this: "Minimum number of characters to be printed. If the value to be printed is shorter than this number, the result is padded with blank spaces. The value is not truncated even if the result is larger." [https://cplusplus.com/reference/cstdio/printf/]. >> Do we want a larger fixed number of digits? > > Yeah, it is not about the correctness. It is more about readability: if we expect more than 8 digits, then the "table" we are printing here would be a bit ragged. UINT64_MAX is about 20 digits. In practice we would probably never do this for longer than 1 hour, and with (ballparking) 100M/sec bytecodes, this gives us a practical upper limit of 12 digits or so? My math might be off a digit or two. Actually, nevermind. I don't think this is useful to adjust. The bytecode counter is global, so it is not a per-bytecode printout like I initially thought. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23766#discussion_r1989704704 From kbarrett at openjdk.org Tue Mar 11 16:47:16 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 11 Mar 2025 16:47:16 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v10] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 15:55:00 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > changed assert in align_up Marked as reviewed by kbarrett (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23711#pullrequestreview-2675368640 From kvn at openjdk.org Tue Mar 11 17:58:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 11 Mar 2025 17:58:54 GMT Subject: RFR: 8351640: Print reason for making method not entrant In-Reply-To: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> References: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> Message-ID: On Tue, 11 Mar 2025 11:36:59 GMT, Aleksey Shipilev wrote: > A simple quality of life improvement. We are studying compiler dynamics in Leyden, and it would be convenient to know why the particular methods are marked as not entrant. We just need to pass the extra string argument to `nmethod::make_not_entrant` and print it out. > > Sample log excerpt for mainline: > > > $ grep com.sun.tools.javac.util.IntHashTable::lookup print-compilation.log > 987 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 1019 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 1024 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used > 4995 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: uncommon trap > 5287 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 6615 5472 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 6626 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used > > > You can now clearly see the method lifecycle. 1 second in app lifetime, the method was initially compiled at level 3. Shortly after, it got compiled at level 4, turning level 3 method unused. 4 seconds later, level 4 method encountered uncommon trap, so we are back to level 3. After 1.3 seconds more, the final compilation at level 4 completed, and second level 3 compilation was removed as unused. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot:tier1` > - [x] Linux x86_64 server fastdebug, `all` Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23980#pullrequestreview-2675594015 From vlivanov at openjdk.org Tue Mar 11 18:52:56 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 11 Mar 2025 18:52:56 GMT Subject: RFR: 8351640: Print reason for making method not entrant In-Reply-To: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> References: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> Message-ID: On Tue, 11 Mar 2025 11:36:59 GMT, Aleksey Shipilev wrote: > A simple quality of life improvement. We are studying compiler dynamics in Leyden, and it would be convenient to know why the particular methods are marked as not entrant. We just need to pass the extra string argument to `nmethod::make_not_entrant` and print it out. > > Sample log excerpt for mainline: > > > $ grep com.sun.tools.javac.util.IntHashTable::lookup print-compilation.log > 987 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 1019 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 1024 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used > 4995 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: uncommon trap > 5287 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 6615 5472 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 6626 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used > > > You can now clearly see the method lifecycle. 1 second in app lifetime, the method was initially compiled at level 3. Shortly after, it got compiled at level 4, turning level 3 method unused. 4 seconds later, level 4 method encountered uncommon trap, so we are back to level 3. After 1.3 seconds more, the final compilation at level 4 completed, and second level 3 compilation was removed as unused. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot:tier1` > - [x] Linux x86_64 server fastdebug, `all` src/hotspot/share/code/nmethod.cpp line 1965: > 1963: if (LogCompilation) { > 1964: if (xtty != nullptr) { > 1965: ttyLocker ttyl; // keep the following output all in one block Please, include same info in `LogCompilation` log. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23980#discussion_r1989937760 From vladimir.kozlov at oracle.com Tue Mar 11 19:23:14 2025 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Mar 2025 12:23:14 -0700 Subject: [External] : RFD: Grouping hot code in CodeCache In-Reply-To: <921F549A-921F-4F3D-A481-43D0F2F25183@amazon.co.uk> References: <1B0C3138-761B-4DB0-8A98-977C6FC40178@amazon.co.uk> <2623f909-bb91-4450-bc05-d9181ba3abcb@oracle.com> <681195aa-50a9-45ad-abe5-6e6e2d164b01@oracle.com> <921F549A-921F-4F3D-A481-43D0F2F25183@amazon.co.uk> Message-ID: <894e661e-6532-463f-b96c-3f17c853bb89@oracle.com> On 3/10/25 3:55 PM, Astigeevich, Evgeny wrote: > Hi Vladimir, > >> I don't like manual part of this - providing list of hot methods which >> should be collocated. > > It looks like I was not clear in my first email and miscommunication happened. > I am sorry. I provided it to share what we tried and what lessons we learned, especially how it is complicated. > We have no intent to upstream list-based solutions. Okay, NP. But do you still want to continue work on next RFE and linked sub-RFEs?: https://bugs.openjdk.org/browse/JDK-8326205 Please, clarify which ones you want to upstream? > >> Sometime ago we had concept of Code > > Thank for sharing. If I remember correctly it uses deoptimization to remove aging code which means recompilation. > > BTW, I found https://openjdk.org/jeps/8350338 " Cooperative JFR Sampling". > I see it has things we want in our implementation. Yes, it started moving. Thanks, Vladimir K > > Thanks, > Evgeny > > ?On 08/03/2025, 02:03, "Vladimir Kozlov" > wrote: > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > > > > On 3/7/25 3:47 PM, Astigeevich, Evgeny wrote: >> Hi Vladimir, >> >> Thank you for the feedback. >> >>> My concern is that it will complicate VM existing code for not >>> significant benefits in real production environment. > > > To clarify. > > > I don't like manual part of this - providing list of hot methods which > should be collocated. > > > I am fine to have special segment for special C2 compiled code. We will > have one for some AOT code in Leyden. > > > Move code in CodeCache to make it more dense is also fine. > > >> >> I think it won't complicate the existing code: >> - Adding a code heap is ~50 lines of code, mostly in CodeCache::initialize_heaps. >> - Relocating nmethods, according to PR[1], is ~300 lines of code. >> - A grouping thread is simple and isolated. It will go through Java threads checking their last frame(s) and recording seen nmethods. It should have less code than the sweeper which was ~700 lines. >> >> I think it's better to wait for PoC to see the complexity. >> >>> What improvements your experiments in real production runs shows? And >>> which JDK version you used for that? >> >> In production we are using internally 17 (static lists of methods) and 21 (dynamic lists of methods). >> Improvements are in a range of 5% - 15%. They depend on how big CPU load is: the more CPU load the bigger improvement. > > > Good. > > >> >>> As you know most of nmethod's metadata is moved from CodeCache. >>> ... >>> After that the code will be a lot more compact in CodeCache. Code sparsity >>> should be less issue then. >> >> Yes, removing non-code from nmethod will improve code density. This means in a code region we will have more code vs non-code. >> CPU instruction caches will like this. >> >> As I wrote in a comment to benchmark PR [2], Neoverse operates in code regions. For Neoverse it's more important to have as less code regions with active nmethods as possible. >> >> We are aware of cases when CodeCache usage is between 512M - 1G. The mentioned changes won't help in those cases. >> If I remember, no public benchmarks have demonstrated improvements from non-code moved away from nmethod. >> >> Since the removal of the Sweeper, GC is in charge of cleaning CodeCache. We've seen cases when GC was triggered often because of allocation pressure on CodeCache. >> For such cases, a recommended workaround is to increase the size of CodeCache from default 240M up to 512M. In such cases actively used nmethods will more likely be sparse. > > > Hmm, may be we should restore counters decay for this case to prevent > warm methods from compiling and polluting CodeCache and keep it small. > > >> >>> It would be nice if you redo your production experiments after that. >> >> Due to the complexity of customer's application we cannot run it on OpenJDKTip. It has thousand dependencies. We will need to move them on OpenJDKTip. >> I think it would be difficult to backport the mentioned changes to 21 > > > Understood. > > >> >>> I understand that we can still have sparsity due to "warm" nmethods and >>> C1 compiled code mixed with "hot" C2 nmethods. >> >> Customers having issues with big CodeCache on Graviton usually turn off tiered compilation to reduce far jumps/calls. BTW, this is another argument for identifying active nmethods and grouping them together: it should reduce/eliminate far jumps/calls. >> With small CodeCache mix of C1 and C2 nmethods is not an issue. >> >>> Can we simply use a separate CodeCache's segment for all >>> C2 "hot" (we can specify frequency flag to determine what "hot" means) >>> methods regardless when they are compiled. >> >> I did not get the idea. We already have the non-profiled segment where C2 code is put. Do you mean that at the compilation time some methods are put in the regular non-profile segment and some in the specific non-profile segment? > > > Yes, I meant separate segments for hot and warm methods, both are c2 > compiled code. > > > It would still mix all 3 cases you pointed because compilation policy > based mostly on what happened during startup. So it may be not good idea. > > >> What we've seen that methods profiles keep changing. >> There are the following cases: >> 1. C2 methods used most of the time: their profile can stay the same or can get hotter. >> 2. C2 methods used periodically: actively used, not used, actively used and so on >> 3. C2 methods used actively during some time and never used after >> >> Currently GC identifies cases #3 and some cases #2, aka cold code. The percentage of methods case #1 is ~10% - 20%. >> If we have 100M of C2 code, only 10M - 20M will be actively used. If we get unlucky, those 10M-20M could be spread across CodeCache and cause CPU stalls. >> How can we identify those 10%-20% of methods at compilation time? > > > I agree that it will be hard to determine that during compilation. > We need some statistic after we compiled to find such methods. > > > Sometime ago we had concept of Code Aging (removed after Sweeper was > removed): > https://urldefense.com/v3/__https://github.com/vnkozlov/jdk17u-dev/commit/54db2c2d612c573f91f69b7b387b43a8e1c9d563__;!!ACWV5N9M2RV99hQ!O67hmASdRidjl2V1_KDN8iqwvBiKycfefSp1XhUOPa_AWGAFwGDX_ojltPiZzV392Cn8T0t-le93_YbXQ4nkfN8$ > > > It added counter on nmethod entry to keep track if it is alive. We can > use something similar to track how frequently nmethod is used. > > > Erik Osterlund also had prototype in Leyden for call stack profiling by > VM itself to find most used hot methods during training run. > > > Thanks, > Vladimir. > > >> >> BTW, I think the separate hot code heap might simplify flushing cold code. Everything not in the hot code heap can automatically assumed cold. >> >> Thanks, >> Evgeny >> >> [1]: https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/23573__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBh_vDxrlg$ >> [2]: https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/23831 *issuecomment-2705085399__;Iw!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhJyHApiM$ >> >> On 06/03/2025, 22:41, "Vladimir Kozlov" >> wrote: >> >> >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >> >> >> >> >> >> >> Hi Evgeny, >> >> >> My concern is that it will complicate VM existing code for not >> significant benefits in real production environment. >> >> >> What improvements your experiments in real production runs shows? And >> which JDK version you used for that? >> >> >> As you know most of nmethod's metadata is moved from CodeCache. And >> Boris Ulasevich will move the final part (relocation info) soon. After >> that the code will be a lot more compact in CodeCache. Code sparsity >> should be less issue then. >> >> >> It would be nice if you redo your production experiments after that. >> >> >> I understand that we can still have sparsity due to "warm" nmethods and >> C1 compiled code mixed with "hot" C2 nmethods. I think compilation >> policy has heuristic to detect "warm" method (time intervals between >> invocations). Can we simply use a separate CodeCache's segment for all >> C2 "hot" (we can specify frequency flag to determine what "hot" means) >> methods regardless when they are compiled. Then you don't need to create >> list or do anything special for them. Most likely we will waste more >> space in CodeCache but it could be conditional under flag which you >> already proposed in separate segment RFE. >> >> >> Thanks, >> Vladimir K >> >> >> On 3/5/25 10:41 AM, Astigeevich, Evgeny wrote: >>> Hi Vladimir, >>> >>> This is JDK-8326205: Implement grouping hot nmethods in CodeCache. >>>> As I managed to synthesize a benchmark >> (https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$ > >>> pull/23831 >>> pull/23831__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- >>> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- >>> imnfmmpfw$>) to demonstrate performance impact of sparse code, I?d like >>> to discuss a possible solution of the sparse code. >>> >>> High level, a solution is: >>> >>> * Detect hot code. >>> * Group hot code. >>> * Maintain grouped code. >>> >>> Downstream we tried two approaches: >>> >>> * *Static lists of methods (compile command):* Identify frequently >>> used (hot) methods using test runs and provide static method lists >>> to JVM in production. When JVM compiles a Java method and the method >>> is on the list, JVM puts the code into to a designated code heap >>> (HotCodeHeap). >>> * *Dynamic lists of methods (compiler directives):* Profile an >>> application in production and dynamically relocate identified hot >>> methods to HotCodeHeap. Relocation was implemented with recompilation. >>> >>> The main advantage of static lists is zero profiling overhead in >>> production. We do all profiling and analysis in test runs. Its problems are: >>> >>> * *Training Run Accuracy*: We need training runs to have execution >>> paths closely mimicking production environments. Otherwise we put >>> wrong methods into HotCodeHeap. >>> * *Method List Maintenance:* We need to rerun training to regenerate >>> lists when application code changes. Training runs are expensive and >>> time-consuming. They require long runs to guarantee we see all major >>> execution paths. Updating lists in production can be as complex as >>> application deployment >>> * *Method Placement Limitations:* Methods marked for HotCodeHeap are >>> permanently placed into HotCodeHeap. No mechanism to remove methods >>> that become less frequently used. >>> >>> We addressed these problems with dynamic lists of methods. We >>> implemented a Java agent that runs within the same JVM to dynamically >>> detect and manage hot Java methods without prior method identification. >>> The agent detects hot methods using JFR. The agent manages hot Java >>> methods in HotCodeHeap with compiler directives. A new compiler >>> directive marks methods with dynamic states ("hot" or "cold"). Methods >>> marked by the ?hot? state are recompiled and placed in HotCodeHeap. >>> Methods marked by the ?cold? state are eventually removed from HotCodeHeap. >>> >>> Problems of this approach are: >>> >>> * It requires specific, complex modifications to compiler directive >>> support: recompilation of Java methods affected by compiler >>> directives changes. This functionality is unique to Java agent >>> implementation and has limited potential for broader use. >>> * The agent cannot guarantee Java methods are moved to/removed from >>> the HotCodeHeap because updates of compiler directives can fail. >>> * The agent knows nothing about compiled code, e.g. whether it?s C1 or >>> C2 compiled, code size, profile. This data can useful for deciding >>> to move or not to move to HotCodeHeap. >>> * Recompilations, especially C2, are expensive. Having many of them >>> can cause performance issues. Also recompiled code might differ from >>> the code we have detected as ?hot?. >>> >>> Running these two approaches in production we learned: >>> >>> * We detect 95% of actively used methods withing the first 30 minutes >>> of an application run. This is with JFR profiling configured: 90 >>> seconds session duration, sampling each 11 ms, 8 minutes between >>> profiling sessions. We can find actively used methods faster if we >>> reduce a pause between profiling sessions and sampling period. >>> However it will increase the profiling overhead and affect >>> application performance. With the current configuration, the >>> profiling overhead is between 1% - 2%. >>> * A set of actively used methods gets into the steady state (no new >>> methods added to, no methods removed from) within the first 60 minutes. >>> * Static lists, when created from runs close to production, have 80% - >>> 90% methods always in use. This does not change over time. >>> * Predicting the size of HotCodeHeap is difficult, especially with >>> dynamic lists. >>> >>> We want to have grouping of hot method functionality as a part Hotspot >>> JVM. We will group only C2 compiled methods. We can group JVMCI compiled >>> methods, e.g. Graal, if needed. We need profiling precise enough to >>> detect major Java methods. Low overhead is more important than precision. >>> >>> We think we can have a solution which does not require a lot of code: >>> >>> * Detect hot code: we can an implementation based on the Sweeper: >>> https://urldefense.com/v3/__https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhMWPimyo$ > >>> runtime/sweeper.hpp >>> openjdk/jdk17u/blob/master/src/hotspot/share/runtime/ >>> sweeper.hpp__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- >>> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- >>> imVr_axpo$>. We will use the handshakes mechanism, what the Sweeper >>> used, to detect nmethods on the top of thread stacks. >>> * Group hot code: we have a draft PR https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$ > >>> pull/23573 >>> jdk/pull/23573__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- >>> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- >>> imcL9xtiE$>. It implements relocation of nmethods within CodeCache. >>> * Maintain grouped code: we will add an additional code heap where hot >>> nmethods will be relocated to. >>> >>> What do you think about this approach? Are there other possible solutions? >>> >>> Thanks, >>> >>> Evgeny A. >>> >>> >>> >>> >>> Amazon Development Centre (London) Ltd.Registered in England and Wales >>> with registration number 04543232 with its registered office at 1 >>> Principal Place, Worship Street, London EC2A 2FA, United Kingdom. >>> >>> >> >> >> >> >> >> >> >> >> Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. >> >> > > > > > > > > > Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. > > From wkemper at openjdk.org Tue Mar 11 19:41:04 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 19:41:04 GMT Subject: RFR: 8350905: Releasing a WeakHandle's referent may extend its lifetime In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 18:57:18 GMT, William Kemper wrote: > When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. We are proposing that native weak references are cleared with an additional `AS_NO_KEEPALIVE` decorator. This is similar to what was done for j.l.r.WeakReference in [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696). > > # Testing > > GHA, `hotspot_gc_shenandoah`. Additionally, for G1, ZGC, and Shenandoah we've run Extremem, Dacapo, SpecJVM2008, SpecJBB2015, Heapothesys and Diluvian. All executions completed without errors. Withdrawing this PR. We'll do this in the Shenandoah barrier. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23935#issuecomment-2715505017 From wkemper at openjdk.org Tue Mar 11 19:41:04 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 19:41:04 GMT Subject: Withdrawn: 8350905: Releasing a WeakHandle's referent may extend its lifetime In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 18:57:18 GMT, William Kemper wrote: > When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. We are proposing that native weak references are cleared with an additional `AS_NO_KEEPALIVE` decorator. This is similar to what was done for j.l.r.WeakReference in [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696). > > # Testing > > GHA, `hotspot_gc_shenandoah`. Additionally, for G1, ZGC, and Shenandoah we've run Extremem, Dacapo, SpecJVM2008, SpecJBB2015, Heapothesys and Diluvian. All executions completed without errors. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23935 From dnsimon at openjdk.org Tue Mar 11 19:41:18 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 11 Mar 2025 19:41:18 GMT Subject: RFR: 8351700: Remove code conditional on BarrierSetNMethod being null Message-ID: All GCs started needing nmethod entry barriers as of loom so there's no longer any need to test for null nmethod entry barriers. ------------- Commit messages: - nmethod entry barriers are no longer optional Changes: https://git.openjdk.org/jdk/pull/23996/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23996&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351700 Stats: 171 lines in 27 files changed: 5 ins; 103 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/23996.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23996/head:pull/23996 PR: https://git.openjdk.org/jdk/pull/23996 From eosterlund at openjdk.org Tue Mar 11 19:41:18 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 11 Mar 2025 19:41:18 GMT Subject: RFR: 8351700: Remove code conditional on BarrierSetNMethod being null In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 19:29:05 GMT, Doug Simon wrote: > All GCs started needing nmethod entry barriers as of loom so there's no longer any need to test for null nmethod entry barriers. Nice! Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23996#pullrequestreview-2675894137 From wkemper at openjdk.org Tue Mar 11 19:42:02 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 19:42:02 GMT Subject: RFR: 8351656: Problemlist gc/TestAllocHumongousFragment#generational In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 13:22:27 GMT, Aleksey Shipilev wrote: > Causes noise in GHA testing, so we need to problemlist it. > > Additional testing: > - [x] Checked the test is skipped locally Thank you. Expect to un-problem list once https://github.com/openjdk/jdk/pull/23997 has been vetted. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23982#pullrequestreview-2675896072 From ysr at openjdk.org Tue Mar 11 19:42:03 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 11 Mar 2025 19:42:03 GMT Subject: RFR: 8351656: Problemlist gc/TestAllocHumongousFragment#generational In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 13:22:27 GMT, Aleksey Shipilev wrote: > Causes noise in GHA testing, so we need to problemlist it. > > Additional testing: > - [x] Checked the test is skipped locally Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23982#pullrequestreview-2675901068 From shade at openjdk.org Tue Mar 11 19:42:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 11 Mar 2025 19:42:03 GMT Subject: Integrated: 8351656: Problemlist gc/TestAllocHumongousFragment#generational In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 13:22:27 GMT, Aleksey Shipilev wrote: > Causes noise in GHA testing, so we need to problemlist it. > > Additional testing: > - [x] Checked the test is skipped locally This pull request has now been integrated. Changeset: cef36931 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/cef369317570f95ac70aac6ceea88a0042ca2b45 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8351656: Problemlist gc/TestAllocHumongousFragment#generational Reviewed-by: xpeng, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/23982 From never at openjdk.org Tue Mar 11 19:53:00 2025 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 11 Mar 2025 19:53:00 GMT Subject: RFR: 8351700: Remove code conditional on BarrierSetNMethod being null In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 19:29:05 GMT, Doug Simon wrote: > All GCs started needing nmethod entry barriers as of loom so there's no longer any need to test for null nmethod entry barriers. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6549: > 6547: BarrierSetNMethod* bs_nm = BarrierSet::barrier_set()->barrier_set_nmethod(); > 6548: if (bs_nm != nullptr) { > 6549: StubRoutines::_method_entry_barrier = generate_method_entry_barrier(); Shouldn't you have kept this line? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23996#discussion_r1990025685 From dnsimon at openjdk.org Tue Mar 11 20:00:59 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 11 Mar 2025 20:00:59 GMT Subject: RFR: 8351700: Remove code conditional on BarrierSetNMethod being null [v2] In-Reply-To: References: Message-ID: > All GCs started needing nmethod entry barriers as of loom so there's no longer any need to test for null nmethod entry barriers. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: revived accidentally deleted code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23996/files - new: https://git.openjdk.org/jdk/pull/23996/files/b958ee43..b3d4721d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23996&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23996&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23996.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23996/head:pull/23996 PR: https://git.openjdk.org/jdk/pull/23996 From dnsimon at openjdk.org Tue Mar 11 20:01:00 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 11 Mar 2025 20:01:00 GMT Subject: RFR: 8351700: Remove code conditional on BarrierSetNMethod being null [v2] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 19:50:18 GMT, Tom Rodriguez wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> revived accidentally deleted code > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6549: > >> 6547: BarrierSetNMethod* bs_nm = BarrierSet::barrier_set()->barrier_set_nmethod(); >> 6548: if (bs_nm != nullptr) { >> 6549: StubRoutines::_method_entry_barrier = generate_method_entry_barrier(); > > Shouldn't you have kept this line? Absolutely! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23996#discussion_r1990039724 From never at openjdk.org Tue Mar 11 21:53:55 2025 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 11 Mar 2025 21:53:55 GMT Subject: RFR: 8351700: Remove code conditional on BarrierSetNMethod being null [v2] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 20:00:59 GMT, Doug Simon wrote: >> All GCs started needing nmethod entry barriers as of loom so there's no longer any need to test for null nmethod entry barriers. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > revived accidentally deleted code Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23996#pullrequestreview-2676195527 From duke at openjdk.org Tue Mar 11 23:30:27 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 11 Mar 2025 23:30:27 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache Message-ID: This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. ------------- Commit messages: - Exclude OSR - Only run RelocateNMethodMultiplePaths on debug builds - Updates after JDK-8343789 - Merge branch 'master' into JDK-8316694-Final - Run RelocateNMethod.java with all GCs - Run tests with all GCs - Add check for already in correct heap - Add nmethod::is_relocatable() - Clean up - Rename replaceNMethod to relocateNMethod - ... and 4 more: https://git.openjdk.org/jdk/compare/da2b4f07...7137a07f Changes: https://git.openjdk.org/jdk/pull/23573/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316694 Stats: 1033 lines in 23 files changed: 1010 ins; 2 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From eastigeevich at openjdk.org Tue Mar 11 23:30:28 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 11 Mar 2025 23:30:28 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 22:05:13 GMT, Chad Rakoczy wrote: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 93: > 91: void trampoline_stub_Relocation::pd_fix_owner_after_move() { > 92: NativeCall* call = nativeCall_at(owner()); > 93: // assert(call->raw_destination() == owner(), "destination should be empty"); We need to move this assert to `trampoline_stub_Relocation::fix_relocation_after_move`. `CodeBuffer::blob()` returns `nullptr` if it wraps `nmethod`. The modified assert will be: void trampoline_stub_Relocation::fix_relocation_after_move(const CodeBuffer* src, CodeBuffer* dest) { // Finalize owner destination only for nmethods if (dest->blob() != nullptr) return; // We either relocate a nmethod residing in CodeCache or just generated code from CodeBuffer assert(src->blob() != nullptr || nativeCall_at(owner())->raw_destination() == owner(), "destination should be empty"); pd_fix_owner_after_move(); } src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1382: > 1380: > 1381: // First instruction must be a nop as it may need to be patched after relocation > 1382: __ nop(); Could you please explain what a problem it fixes? src/hotspot/share/code/nmethod.cpp line 1517: > 1515: _method->clear_entry_points(); > 1516: _method->set_code(mh, this); > 1517: } Why do we need this code? Why not to do the same when when we replace C1 nmethod with C2 nmethod? src/hotspot/share/code/nmethod.cpp line 1566: > 1564: nm->make_deoptimized(); > 1565: nm->flush_dependencies(); > 1566: nm->set_is_unlinked(); Why are you explicitly making these calls? src/hotspot/share/code/nmethod.hpp line 496: > 494: ); > 495: > 496: static nmethod* replace_nmethod(nmethod* nm, int comp_level_override=-1); I think we need a function with the name reflecting our purpose: // Relocate the nmethod to the code heap identified by code_blob_type. // Returns nullptr if the code heap does not have enough space, otherwise // the relocated nmethod. The original nmethod will be invalidated. // If nm is already in the needed code heap, it is not relocated and the function returns it. static nmethod* relocate_to(nmethod* nm, CodeBlobType code_blob_type); test/hotspot/jtreg/compiler/whitebox/ReplaceNMethod.java line 68: > 66: NMethod origNmethod = NMethod.get(method, false); > 67: > 68: WHITE_BOX.replaceNMethod(method, false); I suggest to introduce `relocateNMethodTo`: WHITE_BOX.relocateNMethodTo(method, BlobType.MethodNonProfiled); test/lib/jdk/test/whitebox/WhiteBox.java line 497: > 495: Objects.requireNonNull(method); > 496: replaceNMethod0(method, isOsr, -1); > 497: } We don't support relocation of OSR nmethods. test/lib/jdk/test/whitebox/WhiteBox.java line 499: > 497: } > 498: public native void replaceAllNMethods(); > 499: public native long getNumNMethods(); We don't need this methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1964403179 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1981967623 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1982131418 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1974379846 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1960711698 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1982262476 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1982175396 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1982174461 From duke at openjdk.org Tue Mar 11 23:30:28 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 11 Mar 2025 23:30:28 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 22:05:34 GMT, Evgeny Astigeevich wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 93: > >> 91: void trampoline_stub_Relocation::pd_fix_owner_after_move() { >> 92: NativeCall* call = nativeCall_at(owner()); >> 93: // assert(call->raw_destination() == owner(), "destination should be empty"); > > We need to move this assert to `trampoline_stub_Relocation::fix_relocation_after_move`. > `CodeBuffer::blob()` returns `nullptr` if it wraps `nmethod`. > The modified assert will be: > > void trampoline_stub_Relocation::fix_relocation_after_move(const CodeBuffer* src, CodeBuffer* dest) { > // Finalize owner destination only for nmethods > if (dest->blob() != nullptr) return; > // We either relocate a nmethod residing in CodeCache or just generated code from CodeBuffer > assert(src->blob() != nullptr || nativeCall_at(owner())->raw_destination() == owner(), "destination should be empty"); > pd_fix_owner_after_move(); > } Shouldn't the check be `src->blob() == nullptr` so the assert passes if relocating an `nmethod`? > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1382: > >> 1380: >> 1381: // First instruction must be a nop as it may need to be patched after relocation >> 1382: __ nop(); > > Could you please explain what a problem it fixes? When an `nmethod` gets marked as not entrant the [first instruction is updated](https://github.com/openjdk/jdk/blob/11a37c829c12d064874416a7b242596cf23972e5/src/hotspot/cpu/aarch64/nativeInst_aarch64.cpp#L368) to verify that the code is no longer called. The first instruction must be a jump or nop for this > src/hotspot/share/code/nmethod.cpp line 1517: > >> 1515: _method->clear_entry_points(); >> 1516: _method->set_code(mh, this); >> 1517: } > > Why do we need this code? Why not to do the same when when we replace C1 nmethod with C2 nmethod? `set_code` has an [assert](https://github.com/openjdk/jdk/blob/11a37c829c12d064874416a7b242596cf23972e5/src/hotspot/share/oops/method.cpp#L1298) that the entry is not already set for continuation native intrinsics. It's normally not an issue because it's an intrinsic and never gets recompiled. It's probably better to remove or update that assert instead of calling this for every relocation > src/hotspot/share/code/nmethod.cpp line 1566: > >> 1564: nm->make_deoptimized(); >> 1565: nm->flush_dependencies(); >> 1566: nm->set_is_unlinked(); > > Why are you explicitly making these calls? Those are actually not needed and will be removed in the next revision ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1970713869 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1982342344 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1982339643 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1974557724 From eastigeevich at openjdk.org Tue Mar 11 23:30:28 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 11 Mar 2025 23:30:28 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 00:12:49 GMT, Chad Rakoczy wrote: > Shouldn't the check be `src->blob() == nullptr` so the assert passes if relocating an `nmethod`? Yes, you are right. >> src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1382: >> >>> 1380: >>> 1381: // First instruction must be a nop as it may need to be patched after relocation >>> 1382: __ nop(); >> >> Could you please explain what a problem it fixes? > > When an `nmethod` gets marked as not entrant the [first instruction is updated](https://github.com/openjdk/jdk/blob/11a37c829c12d064874416a7b242596cf23972e5/src/hotspot/cpu/aarch64/nativeInst_aarch64.cpp#L368) to verify that the code is no longer called. The first instruction must be a jump or nop for this This means they are never made not entrant. I don't think we will relocate them. Let's have a function: `bool CodeCache::is_relocatable(nmethod*)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1971314537 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1984158432 From cslucas at openjdk.org Tue Mar 11 23:30:29 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 11 Mar 2025 23:30:29 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 22:05:13 GMT, Chad Rakoczy wrote: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. test/hotspot/jtreg/compiler/whitebox/ReplaceNMethod.java line 65: > 63: compile(); > 64: > 65: checkCompiled(); Looks like the test will fail if the method is currently only queued for compilation, right? Being queued for compilation isn't an error in this situation AFAIU. test/hotspot/jtreg/compiler/whitebox/ReplaceNMethodVerifyNoRecomp.java line 123: > 121: > 122: // Get newly created nmethod > 123: NMethod origNmethod = NMethod.get(method, false); What do you think about adding a loop around the copy/replacement of the method? I think it would make the test more convincing and may not impact it's execution time that much. test/hotspot/jtreg/compiler/whitebox/ReplaceNMethodVerifyNoRecomp.java line 147: > 145: // Call function multiple times to trigger compilation > 146: private static void callFunction() { > 147: for (int i = 0; i < CompilerWhiteBoxTest.THRESHOLD; i++) { NIT: I'd make the loop go up to `THRESHOLD+N` just to be cautious. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1955013437 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1955007955 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1955010779 From duke at openjdk.org Tue Mar 11 23:30:29 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 11 Mar 2025 23:30:29 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 18:18:24 GMT, Cesar Soares Lucas wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > test/hotspot/jtreg/compiler/whitebox/ReplaceNMethod.java line 65: > >> 63: compile(); >> 64: >> 65: checkCompiled(); > > Looks like the test will fail if the method is currently only queued for compilation, right? Being queued for compilation isn't an error in this situation AFAIU. The test is run with `-Xbatch` which disables background compilation so I think it should never get to the compilation check before the method is through the queue ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1955104064 From fyang at openjdk.org Wed Mar 12 00:32:53 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Mar 2025 00:32:53 GMT Subject: RFR: 8351700: Remove code conditional on BarrierSetNMethod being null [v2] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 20:00:59 GMT, Doug Simon wrote: >> All GCs started needing nmethod entry barriers as of loom so there's no longer any need to test for null nmethod entry barriers. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > revived accidentally deleted code src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 9903: > 9901: generate_arraycopy_stubs(); > 9902: > 9903: BarrierSetNMethod* bs_nm = BarrierSet::barrier_set()->barrier_set_nmethod(); Drive-by comment: `bs_nm` seems not used any more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23996#discussion_r1990347462 From fyang at openjdk.org Wed Mar 12 03:04:53 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Mar 2025 03:04:53 GMT Subject: RFR: 8318220: RISC-V: C2 ReverseI In-Reply-To: References: Message-ID: <9uuSKNyy5qPoCOCTCipOlWpSyNIzJUeXUCyE_qa6o34=.14cd0166-7966-458c-a80e-9d0c058463dd@github.com> On Mon, 10 Mar 2025 14:26:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch to add ReverseI and ReverseIL intrinsic on riscv? > > Thanks! Thanks for finding this out. I was also trying to search this `brev8`. I think this will also make it possible to add related vector ones like "ReverseV" and "ReverseBytesV". src/hotspot/cpu/riscv/riscv_b.ad line 257: > 255: __ rev8($dst$$Register, $src$$Register); > 256: __ brev8($dst$$Register, $dst$$Register); > 257: __ srli($dst$$Register, $dst$$Register, 32); Shouldn't this be a arithmetic shift-right operation (`srai`)? Say, we should have a negative value after reversing value 1. Maybe the warmup iters for your newly-added test is too small to cover this? Test.java import java.lang.*; public class Test { public static void main(String[] args) { int a = 1; System.out.println("Number = " + a); // It returns the value obtained by reversing order of the bits System.out.println("By reversing we get = " + Integer.reverse(a)); } } $java Test Number = 1 By reversing we get = -2147483648 src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp line 192: > 190: if (is_set(RISCV_HWPROBE_KEY_IMA_EXT_0, RISCV_HWPROBE_EXT_ZBKB)) { > 191: VM_Version::ext_Zbkb.enable_feature(); > 192: } Are we auto-enabling an experimental feature? ------------- PR Review: https://git.openjdk.org/jdk/pull/23963#pullrequestreview-2676699842 PR Review Comment: https://git.openjdk.org/jdk/pull/23963#discussion_r1990447449 PR Review Comment: https://git.openjdk.org/jdk/pull/23963#discussion_r1990448266 From eirbjo at openjdk.org Wed Mar 12 05:51:58 2025 From: eirbjo at openjdk.org (Eirik =?UTF-8?B?QmrDuHJzbsO4cw==?=) Date: Wed, 12 Mar 2025 05:51:58 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 09:49:44 GMT, Aleksey Shipilev wrote: >> This PR implements JEP 503: Remove the 32-bit x86 Port. >> >> The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. >> >> This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. >> >> The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. >> >> Additional testing: >> - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) >> - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Drop commented out block from deprecations > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Generic 32-bit x86 configure error supercedes Windows 32-bit x86 > - 8345169: Implement JEP 503: Remove the 32-bit x86 Port > JEP 486 (Permanently Disable the Security Manager) updated the API and removed the ability to set a SecurityManager in a first big commit. [..] There were 150+ follow on issues Observation: These JEP 486 follow-on issues served as a nice way for non-experts to contribute with something useful and also to get acquainted with various parts of the OpenJDK code base. Most cleanups followed a predictable pattern, so the implementation work could be distributed also to people not intimately familiar with the particular area without too much risk. Not sure how well this conveys to JEP 503, but I imagine something similar should be possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2716580939 From stuefe at openjdk.org Wed Mar 12 07:02:22 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 12 Mar 2025 07:02:22 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v2] In-Reply-To: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: > Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. > > JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). > Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). > > The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . > > The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. > > Tests: > - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths > - SAP reports all tests green (they had reported errors with the previous version) > - Oracle Tests ongoing > - GHAs green Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: skip test if we have no COH archive ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23912/files - new: https://git.openjdk.org/jdk/pull/23912/files/69b2076e..78894849 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23912&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23912&range=00-01 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23912.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23912/head:pull/23912 PR: https://git.openjdk.org/jdk/pull/23912 From stuefe at openjdk.org Wed Mar 12 07:02:22 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 12 Mar 2025 07:02:22 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: On Wed, 5 Mar 2025 06:34:14 GMT, Thomas Stuefe wrote: > Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. > > JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). > Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). > > The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . > > The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. > > Tests: > - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths > - SAP reports all tests green (they had reported errors with the previous version) > - Oracle Tests ongoing > - GHAs green Both SAP and Oracle report no errors. Oracle tests tripped over the missing COH archives in Oracle builds; I amended to test to skip in case COH archives are missing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2716761388 From shade at openjdk.org Wed Mar 12 07:35:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 07:35:33 GMT Subject: RFR: 8351640: Print reason for making method not entrant [v2] In-Reply-To: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> References: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> Message-ID: <370pnPWKXnqHXz9pVOoU9vFfqdH8zIIV2K7BpqWRcEI=.0c63f38f-84ab-49c3-a0da-1ad9f1b22fb1@github.com> > A simple quality of life improvement. We are studying compiler dynamics in Leyden, and it would be convenient to know why the particular methods are marked as not entrant. We just need to pass the extra string argument to `nmethod::make_not_entrant` and print it out. > > Sample log excerpt for mainline: > > > $ grep com.sun.tools.javac.util.IntHashTable::lookup print-compilation.log > 987 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 1019 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 1024 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used > 4995 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: uncommon trap > 5287 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 6615 5472 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 6626 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used > > > You can now clearly see the method lifecycle. 1 second in app lifetime, the method was initially compiled at level 3. Shortly after, it got compiled at level 4, turning level 3 method unused. 4 seconds later, level 4 method encountered uncommon trap, so we are back to level 3. After 1.3 seconds more, the final compilation at level 4 completed, and second level 3 compilation was removed as unused. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot:tier1` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Add to LogCompilation as well - Merge branch 'master' into JDK-8351640-nmethod-not-entrant-reason - Use resource allocation for temp buffer - Base version ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23980/files - new: https://git.openjdk.org/jdk/pull/23980/files/b13a1080..38491fb2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23980&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23980&range=00-01 Stats: 38661 lines in 408 files changed: 18309 ins; 13442 del; 6910 mod Patch: https://git.openjdk.org/jdk/pull/23980.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23980/head:pull/23980 PR: https://git.openjdk.org/jdk/pull/23980 From shade at openjdk.org Wed Mar 12 07:35:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 07:35:33 GMT Subject: RFR: 8351640: Print reason for making method not entrant [v2] In-Reply-To: References: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> Message-ID: On Tue, 11 Mar 2025 18:45:40 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Add to LogCompilation as well >> - Merge branch 'master' into JDK-8351640-nmethod-not-entrant-reason >> - Use resource allocation for temp buffer >> - Base version > > src/hotspot/share/code/nmethod.cpp line 1965: > >> 1963: if (LogCompilation) { >> 1964: if (xtty != nullptr) { >> 1965: ttyLocker ttyl; // keep the following output all in one block > > Please, include same info in `LogCompilation` log. Sure, added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23980#discussion_r1990826189 From dnsimon at openjdk.org Wed Mar 12 09:16:44 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 12 Mar 2025 09:16:44 GMT Subject: RFR: 8351700: Remove code conditional on BarrierSetNMethod being null [v3] In-Reply-To: References: Message-ID: > All GCs started needing nmethod entry barriers as of loom so there's no longer any need to test for null nmethod entry barriers. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: removed unused code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23996/files - new: https://git.openjdk.org/jdk/pull/23996/files/b3d4721d..95da3c2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23996&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23996&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23996.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23996/head:pull/23996 PR: https://git.openjdk.org/jdk/pull/23996 From ayang at openjdk.org Wed Mar 12 09:19:07 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 12 Mar 2025 09:19:07 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v10] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 15:55:00 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > changed assert in align_up Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23711#pullrequestreview-2677663175 From shade at openjdk.org Wed Mar 12 09:46:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 09:46:01 GMT Subject: RFR: 8351700: Remove code conditional on BarrierSetNMethod being null [v3] In-Reply-To: References: Message-ID: <1stcVqx5LbF9cnNm4gb4YXqoHBbBBigH5fpYlBqRttI=.79261377-2b11-49eb-802d-b579fd23a9ff@github.com> On Wed, 12 Mar 2025 09:16:44 GMT, Doug Simon wrote: >> All GCs started needing nmethod entry barriers as of loom so there's no longer any need to test for null nmethod entry barriers. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > removed unused code Looks fine, thanks. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23996#pullrequestreview-2677747978 From duke at openjdk.org Wed Mar 12 10:15:55 2025 From: duke at openjdk.org (David Linus Briemann) Date: Wed, 12 Mar 2025 10:15:55 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 10:24:44 GMT, David Linus Briemann wrote: >> 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms > > David Linus Briemann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - remove CountBytecodesTest from tier1 > - Merge branch 'master' into dlb/bytecode_counter_overflow > - remove auto included header > - fix x86 asm > - address review comment, add back comma to copyright header > - formatting > - remove bad header > - add missing comma to copyright header > - speed up runtime by running less bytecodes, add explanation > - add copyright header and @bug number > - ... and 5 more: https://git.openjdk.org/jdk/compare/f8da6256...31a52156 Thanks for the reviews! One test timed out ( [gc/TestAllocHumongousFragment#generational](https://github.com/dbriemann/jdk/actions/runs/13762261786/attempts/1#user-content-gc_testallochumongousfragment#generational) ) but it is unrelated to this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23766#issuecomment-2717349812 From duke at openjdk.org Wed Mar 12 10:15:56 2025 From: duke at openjdk.org (duke) Date: Wed, 12 Mar 2025 10:15:56 GMT Subject: RFR: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 10:24:44 GMT, David Linus Briemann wrote: >> 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms > > David Linus Briemann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - remove CountBytecodesTest from tier1 > - Merge branch 'master' into dlb/bytecode_counter_overflow > - remove auto included header > - fix x86 asm > - address review comment, add back comma to copyright header > - formatting > - remove bad header > - add missing comma to copyright header > - speed up runtime by running less bytecodes, add explanation > - add copyright header and @bug number > - ... and 5 more: https://git.openjdk.org/jdk/compare/f8da6256...31a52156 @dbriemann Your change (at version 31a52156cc085c2a7296acf516331781bc61cb5d) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23766#issuecomment-2717351794 From duke at openjdk.org Wed Mar 12 10:26:06 2025 From: duke at openjdk.org (David Linus Briemann) Date: Wed, 12 Mar 2025 10:26:06 GMT Subject: Integrated: 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 08:43:04 GMT, David Linus Briemann wrote: > 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms This pull request has now been integrated. Changeset: 4be502ea Author: David Linus Briemann Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/4be502ea38b37d5fb532b64e5b82363805bfe657 Stats: 109 lines in 12 files changed: 87 ins; 0 del; 22 mod 8350642: Interpreter: Upgrade CountBytecodes to 64 bit on 64 bit platforms Reviewed-by: lmesnik, mdoerr, shade ------------- PR: https://git.openjdk.org/jdk/pull/23766 From mli at openjdk.org Wed Mar 12 10:29:58 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 12 Mar 2025 10:29:58 GMT Subject: RFR: 8318220: RISC-V: C2 ReverseI In-Reply-To: <9uuSKNyy5qPoCOCTCipOlWpSyNIzJUeXUCyE_qa6o34=.14cd0166-7966-458c-a80e-9d0c058463dd@github.com> References: <9uuSKNyy5qPoCOCTCipOlWpSyNIzJUeXUCyE_qa6o34=.14cd0166-7966-458c-a80e-9d0c058463dd@github.com> Message-ID: On Wed, 12 Mar 2025 03:02:04 GMT, Fei Yang wrote: > I was also trying to search this brev8. I think this will also make it possible to add related vector ones like "ReverseV" and "ReverseBytesV". Yes, a later PR will implement it. > src/hotspot/cpu/riscv/riscv_b.ad line 257: > >> 255: __ rev8($dst$$Register, $src$$Register); >> 256: __ brev8($dst$$Register, $dst$$Register); >> 257: __ srli($dst$$Register, $dst$$Register, 32); > > Shouldn't this be an arithmetic shift-right operation (`srai`)? For example, we should have a negative value after reversing int value 1. Maybe the warmup iters for your newly-added test is too small to cover this? > > > Test.java > import java.lang.*; > > public class Test { > > public static void main(String[] args) { > int a = 1; > System.out.println("Number = " + a); > > // It returns the value obtained by reversing order of the bits > System.out.println("By reversing we get = " + Integer.reverse(a)); > } > } > > $java Test > Number = 1 > By reversing we get = -2147483648 Interesting question. `srli` will also result 0x80000000 which is a int of -2147483648, I think the question could be translated to something like, "do we need to sign-extend the result?" I think the answer should be YES, so you're right. I'm thinking how to construct a test so that the current implementation fails. `long l = Integer.reverse(i)` could be a solution. > src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp line 192: > >> 190: if (is_set(RISCV_HWPROBE_KEY_IMA_EXT_0, RISCV_HWPROBE_EXT_ZBKB)) { >> 191: VM_Version::ext_Zbkb.enable_feature(); >> 192: } > > Are we auto-enabling an experimental feature? Thanks, I'll fix this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23963#issuecomment-2717389625 PR Review Comment: https://git.openjdk.org/jdk/pull/23963#discussion_r1991156612 PR Review Comment: https://git.openjdk.org/jdk/pull/23963#discussion_r1991156510 From fyang at openjdk.org Wed Mar 12 10:43:04 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Mar 2025 10:43:04 GMT Subject: RFR: 8318220: RISC-V: C2 ReverseI In-Reply-To: References: <9uuSKNyy5qPoCOCTCipOlWpSyNIzJUeXUCyE_qa6o34=.14cd0166-7966-458c-a80e-9d0c058463dd@github.com> Message-ID: <-zqtxI6hMznbxUI1YkVCoa88fOy11zWecj5dcxU11Ds=.7efd9786-7f00-432b-b4e9-d6754ec98627@github.com> On Wed, 12 Mar 2025 10:26:58 GMT, Hamlin Li wrote: > Interesting question. `srli` will also result 0x80000000 which is a int of -2147483648, I think the question could be translated to something like, "do we need to sign-extend the result?" I think the answer should be YES, so you're right. Yes, that's exactly what I mean. We should have a 32-bit sign-exention here. > I'm thinking how to construct a test so that the current implementation fails. `long l = Integer.reverse(i)` could be a solution. Yeah, maybe. It won't menifest if you simply store the 32-bit result of reversing back into the array as the current test does. The reloading of this signed 32-bit value would sign-extend, thus hiding the issue when checking the result. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23963#discussion_r1991177879 From tschatzl at openjdk.org Wed Mar 12 11:58:45 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Mar 2025 11:58:45 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. Cause are last-minute changes before making the PR ready to review. Testing: without the patch, occurs fairly frequently when continuously (1 in 20) starting refinement. Does not afterward. - * ayang review 3 * comments * minor refactorings - * iwalulya review * renaming * fix some includes, forward declaration - * fix whitespace * additional whitespace between log tags * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename - ayang review * renamings * refactorings - iwalulya review * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement * predicate for determining whether the refinement has been disabled * some other typos/comment improvements * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming - * ayang review - fix comment - * iwalulya review 2 * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState * some additional documentation - ... and 14 more: https://git.openjdk.org/jdk/compare/f77fa17b...aec95051 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/758fac01..aec95051 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=15-16 Stats: 78123 lines in 1539 files changed: 36243 ins; 29177 del; 12703 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From mli at openjdk.org Wed Mar 12 12:09:12 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 12 Mar 2025 12:09:12 GMT Subject: RFR: 8318220: RISC-V: C2 ReverseI [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch to add ReverseI and ReverseIL intrinsic on riscv? > > Thanks! Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - refine tests - use srai instead of srli ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23963/files - new: https://git.openjdk.org/jdk/pull/23963/files/9bce9054..9b6aa066 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23963&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23963&range=00-01 Stats: 49 lines in 3 files changed: 35 ins; 1 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/23963.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23963/head:pull/23963 PR: https://git.openjdk.org/jdk/pull/23963 From shade at openjdk.org Wed Mar 12 12:21:38 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 12:21:38 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v4] In-Reply-To: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: > We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. > > This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - peakCount it is - Merge branch 'master' into JDK-8351142-jfr-deflate-event - Touch up descriptions - Fix test in release builds - Merge branch 'master' into JDK-8351142-jfr-deflate-event - Merge branch 'master' into JDK-8351142-jfr-deflate-event - Test updates - Rework statistics event to be actually statistics - Filter JFR HiddenWait consistently - Event metadata touchups - ... and 2 more: https://git.openjdk.org/jdk/compare/1d147ccb...edd9beaf ------------- Changes: https://git.openjdk.org/jdk/pull/23900/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23900&range=03 Stats: 295 lines in 13 files changed: 284 ins; 6 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23900/head:pull/23900 PR: https://git.openjdk.org/jdk/pull/23900 From shade at openjdk.org Wed Mar 12 12:21:38 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 12:21:38 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v3] In-Reply-To: References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: <56elTKAL9cegOQoZ_DZTJT3obHmWrOZMJa8qm9Zflcg=.8272a3bf-69ac-44a7-b1f2-942dced10f21@github.com> On Mon, 10 Mar 2025 10:04:54 GMT, Aleksey Shipilev wrote: > Looking around other stats in the metadata.xml, maybe a better name for it is peakCount? Yeah, I think it should be `peakCount`. Changed in new version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23900#issuecomment-2717696362 From dnsimon at openjdk.org Wed Mar 12 12:21:57 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 12 Mar 2025 12:21:57 GMT Subject: RFR: 8351700: Remove code conditional on BarrierSetNMethod being null [v3] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 09:16:44 GMT, Doug Simon wrote: >> All GCs started needing nmethod entry barriers as of loom so there's no longer any need to test for null nmethod entry barriers. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > removed unused code `gc/TestAllocHumongousFragment.java#generational` is failing on Windows: https://github.com/dougxc/jdk/actions/runs/13807682996/job/38625487569#step:9:630 I don't think it can be caused by this PR. Are you able to confirm that @shipilev ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23996#issuecomment-2717699848 From shade at openjdk.org Wed Mar 12 12:34:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 12:34:03 GMT Subject: RFR: 8351700: Remove code conditional on BarrierSetNMethod being null [v3] In-Reply-To: <1stcVqx5LbF9cnNm4gb4YXqoHBbBBigH5fpYlBqRttI=.79261377-2b11-49eb-802d-b579fd23a9ff@github.com> References: <1stcVqx5LbF9cnNm4gb4YXqoHBbBBigH5fpYlBqRttI=.79261377-2b11-49eb-802d-b579fd23a9ff@github.com> Message-ID: On Wed, 12 Mar 2025 09:43:21 GMT, Aleksey Shipilev wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> removed unused code > > Looks fine, thanks. > `gc/TestAllocHumongousFragment.java#generational` is failing on Windows: https://github.com/dougxc/jdk/actions/runs/13807682996/job/38625487569#step:9:630 I don't think it can be caused by this PR. Are you able to confirm that @shipilev ? It was problemlisted by #23982 yesterday. You can ignore it, or merge with recent master to get clean GHA runs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23996#issuecomment-2717727270 From dnsimon at openjdk.org Wed Mar 12 12:34:04 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 12 Mar 2025 12:34:04 GMT Subject: RFR: 8351700: Remove code conditional on BarrierSetNMethod being null [v3] In-Reply-To: References: Message-ID: <-urz_l6_Sa21e9SspzfanN4VGdOFZJxOv6E79Npfv5A=.baeb6814-351b-4711-b7fe-4d87e0700532@github.com> On Wed, 12 Mar 2025 09:16:44 GMT, Doug Simon wrote: >> All GCs started needing nmethod entry barriers as of loom so there's no longer any need to test for null nmethod entry barriers. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > removed unused code I'll ignore it. Thanks for pointing out the problem listing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23996#issuecomment-2717730379 From dnsimon at openjdk.org Wed Mar 12 12:34:05 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 12 Mar 2025 12:34:05 GMT Subject: Integrated: 8351700: Remove code conditional on BarrierSetNMethod being null In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 19:29:05 GMT, Doug Simon wrote: > All GCs started needing nmethod entry barriers as of loom so there's no longer any need to test for null nmethod entry barriers. This pull request has now been integrated. Changeset: 95b66d5a Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/95b66d5a43a77b257a097afe5df369f92769abd2 Stats: 171 lines in 27 files changed: 5 ins; 102 del; 64 mod 8351700: Remove code conditional on BarrierSetNMethod being null Reviewed-by: shade, eosterlund, never ------------- PR: https://git.openjdk.org/jdk/pull/23996 From mli at openjdk.org Wed Mar 12 13:19:44 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 12 Mar 2025 13:19:44 GMT Subject: RFR: 8318220: RISC-V: C2 ReverseI [v3] In-Reply-To: References: Message-ID: <6lGIjtZULnE1TEzlsLaNNO9CKe1PRbgPO2KAvUbvimU=.7e813b51-5a8f-4c84-af8f-ee9342844491@github.com> > Hi, > Can you help to review this patch to add ReverseI and ReverseIL intrinsic on riscv? > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: not enable Zbkb automatically ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23963/files - new: https://git.openjdk.org/jdk/pull/23963/files/9b6aa066..1d8770b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23963&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23963&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23963.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23963/head:pull/23963 PR: https://git.openjdk.org/jdk/pull/23963 From ayang at openjdk.org Wed Mar 12 13:34:04 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 12 Mar 2025 13:34:04 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17] In-Reply-To: References: Message-ID: <0w7seS1tIFhUxnmStxQySISWVfpBBsRmUtx7EoTy9a4=.509a3d5e-56d0-4fd8-8896-51835b14302b@github.com> On Wed, 12 Mar 2025 11:58:45 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: > > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang > - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. > Cause are last-minute changes before making the PR ready to review. > > Testing: without the patch, occurs fairly frequently when continuously > (1 in 20) starting refinement. Does not afterward. > - * ayang review 3 > * comments > * minor refactorings > - * iwalulya review > * renaming > * fix some includes, forward declaration > - * fix whitespace > * additional whitespace between log tags > * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename > - ayang review > * renamings > * refactorings > - iwalulya review > * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement > * predicate for determining whether the refinement has been disabled > * some other typos/comment improvements > * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming > - * ayang review - fix comment > - * iwalulya review 2 > * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState > * some additional documentation > - ... and 14 more: https://git.openjdk.org/jdk/compare/53a66058...aec95051 src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 217: > 215: > 216: { > 217: SuspendibleThreadSetLeaver sts_leave; Can you add some comment on why leaving the set is required? It's not obvious to me why. I'd expect handshake to work out of the box... src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 263: > 261: > 262: SuspendibleThreadSetLeaver sts_leave; > 263: VMThread::execute(&op); Can you elaborate what synchronization this VM op is trying to achieve? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1991489399 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1991382024 From ayang at openjdk.org Wed Mar 12 13:33:59 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 12 Mar 2025 13:33:59 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v14] In-Reply-To: <5w6qUwzDQadxseocRl6rRF0AllyeukWTpYl2XjAfiTE=.fb62a50e-e308-4d08-8057-67e70e13ccbb@github.com> References: <5w6qUwzDQadxseocRl6rRF0AllyeukWTpYl2XjAfiTE=.fb62a50e-e308-4d08-8057-67e70e13ccbb@github.com> Message-ID: On Fri, 7 Mar 2025 13:14:02 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * iwalulya review >> * renaming >> * fix some includes, forward declaration > > src/hotspot/share/gc/g1/g1CardTable.hpp line 76: > >> 74: g1_card_already_scanned = 0x1, >> 75: g1_to_cset_card = 0x2, >> 76: g1_from_remset_card = 0x4 > > Could you outline the motivation for this more precise info? Is it for optimization or essentially for correctness? OK, it's for better performance, not correctness. How much is the improvement? As I understand it, this more precise info is largely independent of the new barrier logic. I wonder if it makes sense to extract this out to its own ticket to better assess its impact on perf and impl complexity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1991375754 From duke at openjdk.org Wed Mar 12 13:42:33 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 12 Mar 2025 13:42:33 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v6] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Added validity test for the intrinsics. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23860/files - new: https://git.openjdk.org/jdk/pull/23860/files/64135f29..f65ef7c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=04-05 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From egahlin at openjdk.org Wed Mar 12 13:45:00 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 12 Mar 2025 13:45:00 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v4] In-Reply-To: References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: On Wed, 12 Mar 2025 12:21:38 GMT, Aleksey Shipilev wrote: >> We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. >> >> This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - peakCount it is > - Merge branch 'master' into JDK-8351142-jfr-deflate-event > - Touch up descriptions > - Fix test in release builds > - Merge branch 'master' into JDK-8351142-jfr-deflate-event > - Merge branch 'master' into JDK-8351142-jfr-deflate-event > - Test updates > - Rework statistics event to be actually statistics > - Filter JFR HiddenWait consistently > - Event metadata touchups > - ... and 2 more: https://git.openjdk.org/jdk/compare/1d147ccb...edd9beaf Since this event is only emitted once per chunk, it might be necessary to have a peak value to avoid sampling bias, but I think we should only add such metrics where there is a strong justification to do so, and where a calculated value would have failed to solve the underlying problem. I don't want to end up in a situation where we add peak, average, minimum, etc. for every event value. It adds noise and may confuse users when there are two maximum values in the GUI, one during the recording and one from when the JVM started. I agree, "peakCount" is a better name than "maxCount". ------------- PR Comment: https://git.openjdk.org/jdk/pull/23900#issuecomment-2717928609 From egahlin at openjdk.org Wed Mar 12 13:49:55 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 12 Mar 2025 13:49:55 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v4] In-Reply-To: References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: On Wed, 12 Mar 2025 12:21:38 GMT, Aleksey Shipilev wrote: >> We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. >> >> This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - peakCount it is > - Merge branch 'master' into JDK-8351142-jfr-deflate-event > - Touch up descriptions > - Fix test in release builds > - Merge branch 'master' into JDK-8351142-jfr-deflate-event > - Merge branch 'master' into JDK-8351142-jfr-deflate-event > - Test updates > - Rework statistics event to be actually statistics > - Filter JFR HiddenWait consistently > - Event metadata touchups > - ... and 2 more: https://git.openjdk.org/jdk/compare/1d147ccb...edd9beaf Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23900#pullrequestreview-2678549616 From duke at openjdk.org Wed Mar 12 13:51:58 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 12 Mar 2025 13:51:58 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v5] In-Reply-To: References: <3bphXKLpIpxAZP-FEOeob6AaHbv0BAoEceJka64vMW8=.3e4f74e0-9479-4926-b365-b08d8d702692@github.com> Message-ID: On Mon, 10 Mar 2025 03:00:09 GMT, Leonid Mesnik wrote: > There are no any new tests in the PR. How fix has been tested by openjdk tests? I have just added one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23860#issuecomment-2717950685 From duke at openjdk.org Wed Mar 12 13:52:02 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 12 Mar 2025 13:52:02 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 14:30:35 GMT, Jatin Bhateja wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Added alignment to loop entries. > > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. > > Please update copyright year Thanks, fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 96: > >> 94: StubRoutines::_dilithiumMontMulByConstant = generate_dilithiumMontMulByConstant_avx512(); >> 95: StubRoutines::_dilithiumDecomposePoly = generate_dilithiumDecomposePoly_avx512(); >> 96: } > > Indentation fix needed Thanks, fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 362: > >> 360: const Register roundsLeft = r11; >> 361: >> 362: __ align(OptoLoopAlignment); > > Redundant alignment before label should be before it's bind Thanks, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1991546308 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1991546488 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1991546606 From duke at openjdk.org Wed Mar 12 13:52:06 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 12 Mar 2025 13:52:06 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v3] In-Reply-To: References: <0E2AqFpNPjDjP6jqCXn8toePBcW2SIHw1kFXlZX4W_U=.8d692bfa-0598-4969-b480-4a285366e0bb@github.com> Message-ID: <74tlAsyoYwN-fvtFyxp3xJYo76U68oF0ES4UVy7S_iY=.01f96647-395e-49bb-9e5a-f047b63460e0@github.com> On Thu, 6 Mar 2025 09:32:19 GMT, Jatin Bhateja wrote: >> I think the easiest is to put a for (int i = 0; i < 1000; i++) loop around the switch statement in the run() method of the ML_DSA_Test class (test/jdk/sun/security/provider/acvp/ML_DSA_Test.java). (This is because the intrinsics kick in after a few thousand calls of the method.) > > Hi @ferakocz , Yes, we should modify the test or lower the compilation threshold with -Xbatch -XX:TieredCompileThreshold=0.1. > > Alternatively, since the tests has a depedency on Automatic Cryptographic Validation Test server I have created a simplified test which cover all the security levels. > > Kindly include [test/hotspot/jtreg/compiler/intrinsics/signature/TestModuleLatticeDSA.java > ](https://github.com/ferakocz/jdk/pull/1) I have added a new command to the test test/jdk/sun/security/provider/acvp/Launcher.java. The line with the -Xcomp will invoke the intrinsics on the first call, so they will be tested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1991546056 From tschatzl at openjdk.org Wed Mar 12 14:00:15 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Mar 2025 14:00:15 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17] In-Reply-To: <0w7seS1tIFhUxnmStxQySISWVfpBBsRmUtx7EoTy9a4=.509a3d5e-56d0-4fd8-8896-51835b14302b@github.com> References: <0w7seS1tIFhUxnmStxQySISWVfpBBsRmUtx7EoTy9a4=.509a3d5e-56d0-4fd8-8896-51835b14302b@github.com> Message-ID: On Wed, 12 Mar 2025 12:23:50 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: >> >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang >> - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. >> Cause are last-minute changes before making the PR ready to review. >> >> Testing: without the patch, occurs fairly frequently when continuously >> (1 in 20) starting refinement. Does not afterward. >> - * ayang review 3 >> * comments >> * minor refactorings >> - * iwalulya review >> * renaming >> * fix some includes, forward declaration >> - * fix whitespace >> * additional whitespace between log tags >> * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename >> - ayang review >> * renamings >> * refactorings >> - iwalulya review >> * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement >> * predicate for determining whether the refinement has been disabled >> * some other typos/comment improvements >> * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming >> - * ayang review - fix comment >> - * iwalulya review 2 >> * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState >> * some additional documentation >> - ... and 14 more: https://git.openjdk.org/jdk/compare/5727f166...aec95051 > > src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 263: > >> 261: >> 262: SuspendibleThreadSetLeaver sts_leave; >> 263: VMThread::execute(&op); > > Can you elaborate what synchronization this VM op is trying to achieve? Memory visibility for refinement threads for the references written to the heap. Without them, they may not have received the most recent values. This is the same as the `StoreLoad` barriers synchronization between mutator and refinement threads imo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1991561707 From shade at openjdk.org Wed Mar 12 14:12:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 14:12:22 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v5] In-Reply-To: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: > We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. > > This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Drop peak count completely ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23900/files - new: https://git.openjdk.org/jdk/pull/23900/files/edd9beaf..0caf4b3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23900&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23900&range=03-04 Stats: 9 lines in 3 files changed: 0 ins; 9 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23900/head:pull/23900 PR: https://git.openjdk.org/jdk/pull/23900 From shade at openjdk.org Wed Mar 12 14:12:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 14:12:22 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v4] In-Reply-To: References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: <_53gYMVMFDTSHp-OFeK6ZQqxWli_wmDhWyh6cF1_DW8=.fd85e438-aa2d-4490-9c13-ab1531a11612@github.com> On Wed, 12 Mar 2025 13:42:01 GMT, Erik Gahlin wrote: > I think we should only add such metrics where there is a strong justification to do so, and where a calculated value would have failed to solve the underlying problem Right, makes sense. Let's recap. The underlying reason for providing peak statistics is to track the population of object monitors without computing it from individual inflate/deflate events. Since it is periodic, it run into sampling bias. The sampling bias works not only for peaks, but also for dips, so I would guess a fuller solution would be indeed to add `min/max` counters, _if_ we wanted to avoid the bias. But, I think this goes too far. We want to replace one of the `ObjectMonitor` counters that counts the instantaneous monitor population. It does not really report peak/max. I can see that OM code might want to stop tracking `max`, but JFR event would force its hand, if we start reporting it. So, thinking that _adding_ a new field into JFR event is easier than yanking the unnecessary/bad one, I think we should be conservative and just report the instantaneous monitor population. If we find it is insufficient, then we can talk about extending the event. Sounds good? Dropped `peakCount` from this PR in new version. Take a look again, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23900#issuecomment-2718008992 From jbechberger at openjdk.org Wed Mar 12 15:18:39 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 12 Mar 2025 15:18:39 GMT Subject: RFR: 8342818: Implement CPU Time Profiling for JFR [v39] In-Reply-To: References: Message-ID: > This is the code for the [JEP draft: CPU Time based profiling for JFR]. > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Don't record exiting threads ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20752/files - new: https://git.openjdk.org/jdk/pull/20752/files/53a2560d..18ec3811 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=38 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=37-38 Stats: 7 lines in 2 files changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20752.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20752/head:pull/20752 PR: https://git.openjdk.org/jdk/pull/20752 From lmesnik at openjdk.org Wed Mar 12 15:37:12 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 12 Mar 2025 15:37:12 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v6] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 13:42:33 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Added validity test for the intrinsics. test/jdk/sun/security/provider/acvp/Launcher.java line 43: > 41: * @modules java.base/sun.security.provider > 42: * @run main Launcher > 43: * @run main/othervm -Xcomp Launcher Thank you for adding this case. Please add it as a separate testcase: /* * @test * @summary Test verifies intrinsic implementation. * @library /test/lib * @modules java.base/sun.security.provider * @run main/othervm -Xcomp Launcher */ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1991769739 From egahlin at openjdk.org Wed Mar 12 15:51:00 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 12 Mar 2025 15:51:00 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v5] In-Reply-To: References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: On Wed, 12 Mar 2025 14:12:22 GMT, Aleksey Shipilev wrote: >> We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. >> >> This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Drop peak count completely Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23900#pullrequestreview-2679013734 From shade at openjdk.org Wed Mar 12 16:14:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 16:14:53 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v5] In-Reply-To: References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: On Wed, 12 Mar 2025 14:12:22 GMT, Aleksey Shipilev wrote: >> We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. >> >> This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Drop peak count completely Thank you for reviews, appreciated! I'll integrate shortly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23900#issuecomment-2718408008 From rehn at openjdk.org Wed Mar 12 16:28:03 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 12 Mar 2025 16:28:03 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v6] In-Reply-To: <_YhLkn3fphUOBhl2tyMmEEnk282U5nzSJZeez0sKtXc=.5c0fdac0-7c49-4567-8861-6b5b03de226f@github.com> References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> <_YhLkn3fphUOBhl2tyMmEEnk282U5nzSJZeez0sKtXc=.5c0fdac0-7c49-4567-8861-6b5b03de226f@github.com> Message-ID: <4_grDj7eRwyoXZgzylbREAKsetpQ27JGwZ-luO9jmi8=.c7eec439-5277-4f52-961d-9f17f2b9496d@github.com> On Tue, 11 Mar 2025 12:43:30 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. >> >> ## Performance >> >> data >> >> Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 >> Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 >> Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 >> Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 >> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 >> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 >> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 >> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.81... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > refine Seems alright, thanks! src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2172: > 2170: > 2171: switch (ft) { > 2172: case FLOAT_TYPE::half_precision: I wouldn't switch three times on **ft**. Maybe just switch once, and have all in that switch, e.g.: case FLOAT_TYPE::half_precision: fclass_h(t0, src1); fclass_h(t1, src2); orr(t0, t0, t1); andi(t0, t0, FClassBits::nan); // if src1 or src2 is quiet or signaling NaN then return NaN fadd_h(dst, src1, src2); beqz(t0, Compare); j(Done); bind(Compare); if (is_min) { fmin_h(dst, src1, src2); } else { fmax_h(dst, src1, src2); } break; No ? ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23844#pullrequestreview-2679127115 PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1991861543 From iklam at openjdk.org Wed Mar 12 17:19:14 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 12 Mar 2025 17:19:14 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v2] In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: <4lV6ssHh71lUtGEHsxkKoPDz7GrcZrmUvKXGsfjzbE4=.e3072dfd-f189-4d55-9c7d-18991b4744d6@github.com> On Wed, 12 Mar 2025 07:02:22 GMT, Thomas Stuefe wrote: >> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. >> >> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). >> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). >> >> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . >> >> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. >> >> Tests: >> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths >> - SAP reports all tests green (they had reported errors with the previous version) >> - Oracle Tests ongoing >> - GHAs green > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > skip test if we have no COH archive src/hotspot/share/cds/metaspaceShared.cpp line 1431: > 1429: #ifdef _LP64 > 1430: if (Metaspace::using_class_space()) { > 1431: assert(prot_zone_size > 0 && This code assumes that `prot_zone_size > 0`, but we have other code that checks `if (prot_zone_size > 0)`. Should the "if" be changed to asserts? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23912#discussion_r1991953808 From vlivanov at openjdk.org Wed Mar 12 17:25:06 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 12 Mar 2025 17:25:06 GMT Subject: RFR: 8351640: Print reason for making method not entrant [v2] In-Reply-To: <370pnPWKXnqHXz9pVOoU9vFfqdH8zIIV2K7BpqWRcEI=.0c63f38f-84ab-49c3-a0da-1ad9f1b22fb1@github.com> References: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> <370pnPWKXnqHXz9pVOoU9vFfqdH8zIIV2K7BpqWRcEI=.0c63f38f-84ab-49c3-a0da-1ad9f1b22fb1@github.com> Message-ID: On Wed, 12 Mar 2025 07:35:33 GMT, Aleksey Shipilev wrote: >> A simple quality of life improvement. We are studying compiler dynamics in Leyden, and it would be convenient to know why the particular methods are marked as not entrant. We just need to pass the extra string argument to `nmethod::make_not_entrant` and print it out. >> >> Sample log excerpt for mainline: >> >> >> $ grep com.sun.tools.javac.util.IntHashTable::lookup print-compilation.log >> 987 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) >> 1019 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) >> 1024 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used >> 4995 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: uncommon trap >> 5287 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) >> 6615 5472 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) >> 6626 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used >> >> >> You can now clearly see the method lifecycle. 1 second in app lifetime, the method was initially compiled at level 3. Shortly after, it got compiled at level 4, turning level 3 method unused. 4 seconds later, level 4 method encountered uncommon trap, so we are back to level 3. After 1.3 seconds more, the final compilation at level 4 completed, and second level 3 compilation was removed as unused. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `hotspot:tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Add to LogCompilation as well > - Merge branch 'master' into JDK-8351640-nmethod-not-entrant-reason > - Use resource allocation for temp buffer > - Base version Looks good. Do you mind incorporating log compilation tool support? [1] diff --git a/src/utils/LogCompilation/src/main/java/com/sun/hotspot/tools/compiler/LogParser.java b/src/utils/LogCompilation/src/main/java/com/sun/hotspot/tools/compiler/LogParser.java index e1e305abe10..61cbc054200 100644 --- a/src/utils/LogCompilation/src/main/java/com/sun/hotspot/tools/compiler/LogParser.java +++ b/src/utils/LogCompilation/src/main/java/com/sun/hotspot/tools/compiler/LogParser.java @@ -1099,6 +1099,10 @@ public void startElement(String uri, String localName, String qname, Attributes e.setCompileKind(compileKind); String level = atts.getValue("level"); e.setLevel(level); + String reason = atts.getValue("reason"); + if (reason != null) { + e.setReason(reason); + } events.add(e); } else if (qname.equals("uncommon_trap")) { String id = atts.getValue("compile_id"); diff --git a/src/utils/LogCompilation/src/main/java/com/sun/hotspot/tools/compiler/MakeNotEntrantEvent.java b/src/utils/LogCompilation/src/main/java/com/sun/hotspot/tools/compiler/MakeNotEntrantEvent.java index b4015537c74..d230f1b4336 100644 --- a/src/utils/LogCompilation/src/main/java/com/sun/hotspot/tools/compiler/MakeNotEntrantEvent.java +++ b/src/utils/LogCompilation/src/main/java/com/sun/hotspot/tools/compiler/MakeNotEntrantEvent.java @@ -47,6 +47,11 @@ class MakeNotEntrantEvent extends BasicLogEvent { */ private String level; + /** + * The reason of invalidation. + */ + private String reason; + /** * The compile kind. */ @@ -64,10 +69,14 @@ public NMethod getNMethod() { public void print(PrintStream stream, boolean printID) { if (isZombie()) { - stream.printf("%s make_zombie\n", getId()); + stream.printf("%s make_zombie", getId()); } else { - stream.printf("%s make_not_entrant\n", getId()); + stream.printf("%s make_not_entrant", getId()); + } + if (getReason() != null) { + stream.printf(": %s", getReason()); } + stream.println(); } public boolean isZombie() { @@ -88,7 +97,21 @@ public void setLevel(String level) { this.level = level; } - /** + /** + * @return the reason + */ + public String getReason() { + return reason; + } + + /** + * @param reason the reason to set + */ + public void setReason(String reason) { + this.reason = reason; + } + + /** * @return the compileKind */ public String getCompileKind() { ------------- PR Review: https://git.openjdk.org/jdk/pull/23980#pullrequestreview-2679301582 From mli at openjdk.org Wed Mar 12 17:29:40 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 12 Mar 2025 17:29:40 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v7] In-Reply-To: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: > Hi, > Can you help to review this patch? > It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. > > ## Performance > > data > > Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 > Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 > Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 > Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 > Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 > Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 > Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 > Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.818 | 250.761 | 5191.12 | 64.598 | ns/op | 17.639 > Fl... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: refine switch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23844/files - new: https://git.openjdk.org/jdk/pull/23844/files/e63061d2..3957bbca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23844&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23844&range=05-06 Stats: 75 lines in 1 file changed: 28 ins; 34 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/23844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23844/head:pull/23844 PR: https://git.openjdk.org/jdk/pull/23844 From mli at openjdk.org Wed Mar 12 17:29:40 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 12 Mar 2025 17:29:40 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v6] In-Reply-To: <4_grDj7eRwyoXZgzylbREAKsetpQ27JGwZ-luO9jmi8=.c7eec439-5277-4f52-961d-9f17f2b9496d@github.com> References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> <_YhLkn3fphUOBhl2tyMmEEnk282U5nzSJZeez0sKtXc=.5c0fdac0-7c49-4567-8861-6b5b03de226f@github.com> <4_grDj7eRwyoXZgzylbREAKsetpQ27JGwZ-luO9jmi8=.c7eec439-5277-4f52-961d-9f17f2b9496d@github.com> Message-ID: <9Asxh0g8jAcYYlt-ioTfTcamn5gGSTVumYEoFpoyLLk=.abe6a31f-87f5-49e6-99c8-8e004276d32c@github.com> On Wed, 12 Mar 2025 16:22:57 GMT, Robbin Ehn wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> refine > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2172: > >> 2170: >> 2171: switch (ft) { >> 2172: case FLOAT_TYPE::half_precision: > > I wouldn't switch three times on **ft**. > > Maybe just switch once, and have all in that switch, e.g.: > > case FLOAT_TYPE::half_precision: > fclass_h(t0, src1); > fclass_h(t1, src2); > orr(t0, t0, t1); > andi(t0, t0, FClassBits::nan); // if src1 or src2 is quiet or signaling NaN then return NaN > fadd_h(dst, src1, src2); > beqz(t0, Compare); > j(Done); > bind(Compare); > if (is_min) { > fmin_h(dst, src1, src2); > } else { > fmax_h(dst, src1, src2); > } > break; > > > No ? Good suggestion, seems it's better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1991969494 From shade at openjdk.org Wed Mar 12 17:39:35 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 17:39:35 GMT Subject: RFR: 8351640: Print reason for making method not entrant [v3] In-Reply-To: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> References: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> Message-ID: > A simple quality of life improvement. We are studying compiler dynamics in Leyden, and it would be convenient to know why the particular methods are marked as not entrant. We just need to pass the extra string argument to `nmethod::make_not_entrant` and print it out. > > Sample log excerpt for mainline: > > > $ grep com.sun.tools.javac.util.IntHashTable::lookup print-compilation.log > 987 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 1019 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 1024 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used > 4995 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: uncommon trap > 5287 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 6615 5472 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 6626 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used > > > You can now clearly see the method lifecycle. 1 second in app lifetime, the method was initially compiled at level 3. Shortly after, it got compiled at level 4, turning level 3 method unused. 4 seconds later, level 4 method encountered uncommon trap, so we are back to level 3. After 1.3 seconds more, the final compilation at level 4 completed, and second level 3 compilation was removed as unused. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot:tier1` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Add LogCompilation support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23980/files - new: https://git.openjdk.org/jdk/pull/23980/files/38491fb2..5da9766d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23980&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23980&range=01-02 Stats: 30 lines in 2 files changed: 27 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23980.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23980/head:pull/23980 PR: https://git.openjdk.org/jdk/pull/23980 From shade at openjdk.org Wed Mar 12 17:39:35 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 17:39:35 GMT Subject: RFR: 8351640: Print reason for making method not entrant [v2] In-Reply-To: References: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> <370pnPWKXnqHXz9pVOoU9vFfqdH8zIIV2K7BpqWRcEI=.0c63f38f-84ab-49c3-a0da-1ad9f1b22fb1@github.com> Message-ID: On Wed, 12 Mar 2025 17:22:06 GMT, Vladimir Ivanov wrote: > Do you mind incorporating log compilation tool support? [1] I don't mind, added. Looks like this still works: $ cd src/tools/LogCompilation $ make ------------- PR Comment: https://git.openjdk.org/jdk/pull/23980#issuecomment-2718629919 From tschatzl at openjdk.org Wed Mar 12 17:44:01 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Mar 2025 17:44:01 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17] In-Reply-To: <0w7seS1tIFhUxnmStxQySISWVfpBBsRmUtx7EoTy9a4=.509a3d5e-56d0-4fd8-8896-51835b14302b@github.com> References: <0w7seS1tIFhUxnmStxQySISWVfpBBsRmUtx7EoTy9a4=.509a3d5e-56d0-4fd8-8896-51835b14302b@github.com> Message-ID: On Wed, 12 Mar 2025 13:20:25 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: >> >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang >> - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. >> Cause are last-minute changes before making the PR ready to review. >> >> Testing: without the patch, occurs fairly frequently when continuously >> (1 in 20) starting refinement. Does not afterward. >> - * ayang review 3 >> * comments >> * minor refactorings >> - * iwalulya review >> * renaming >> * fix some includes, forward declaration >> - * fix whitespace >> * additional whitespace between log tags >> * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename >> - ayang review >> * renamings >> * refactorings >> - iwalulya review >> * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement >> * predicate for determining whether the refinement has been disabled >> * some other typos/comment improvements >> * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming >> - * ayang review - fix comment >> - * iwalulya review 2 >> * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState >> * some additional documentation >> - ... and 14 more: https://git.openjdk.org/jdk/compare/0c7b5abb...aec95051 > > src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 217: > >> 215: >> 216: { >> 217: SuspendibleThreadSetLeaver sts_leave; > > Can you add some comment on why leaving the set is required? It's not obvious to me why. I'd expect handshake to work out of the box... It isn't apparently. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1991999476 From tschatzl at openjdk.org Wed Mar 12 17:59:51 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Mar 2025 17:59:51 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v18] In-Reply-To: References: Message-ID: <3KOwgdzYn_vXQVWisVUEY-0i1gtZEfZhcD1-id3epYE=.17aa84bc-a7ec-4dda-b596-7a1016d710fc@github.com> > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * ayang review * remove unnecessary STSleaver * some more documentation around to_collection_card card color ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/aec95051..3766b76c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=16-17 Stats: 18 lines in 2 files changed: 5 ins; 4 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From vlivanov at openjdk.org Wed Mar 12 18:17:03 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 12 Mar 2025 18:17:03 GMT Subject: RFR: 8351640: Print reason for making method not entrant [v3] In-Reply-To: References: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> Message-ID: On Wed, 12 Mar 2025 17:39:35 GMT, Aleksey Shipilev wrote: >> A simple quality of life improvement. We are studying compiler dynamics in Leyden, and it would be convenient to know why the particular methods are marked as not entrant. We just need to pass the extra string argument to `nmethod::make_not_entrant` and print it out. >> >> Sample log excerpt for mainline: >> >> >> $ grep com.sun.tools.javac.util.IntHashTable::lookup print-compilation.log >> 987 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) >> 1019 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) >> 1024 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used >> 4995 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: uncommon trap >> 5287 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) >> 6615 5472 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) >> 6626 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used >> >> >> You can now clearly see the method lifecycle. 1 second in app lifetime, the method was initially compiled at level 3. Shortly after, it got compiled at level 4, turning level 3 method unused. 4 seconds later, level 4 method encountered uncommon trap, so we are back to level 3. After 1.3 seconds more, the final compilation at level 4 completed, and second level 3 compilation was removed as unused. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `hotspot:tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Add LogCompilation support Thanks. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23980#pullrequestreview-2679447944 From shade at openjdk.org Wed Mar 12 18:17:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 18:17:04 GMT Subject: RFR: 8351640: Print reason for making method not entrant [v3] In-Reply-To: References: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> Message-ID: On Wed, 12 Mar 2025 17:39:35 GMT, Aleksey Shipilev wrote: >> A simple quality of life improvement. We are studying compiler dynamics in Leyden, and it would be convenient to know why the particular methods are marked as not entrant. We just need to pass the extra string argument to `nmethod::make_not_entrant` and print it out. >> >> Sample log excerpt for mainline: >> >> >> $ grep com.sun.tools.javac.util.IntHashTable::lookup print-compilation.log >> 987 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) >> 1019 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) >> 1024 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used >> 4995 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: uncommon trap >> 5287 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) >> 6615 5472 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) >> 6626 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used >> >> >> You can now clearly see the method lifecycle. 1 second in app lifetime, the method was initially compiled at level 3. Shortly after, it got compiled at level 4, turning level 3 method unused. 4 seconds later, level 4 method encountered uncommon trap, so we are back to level 3. After 1.3 seconds more, the final compilation at level 4 completed, and second level 3 compilation was removed as unused. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `hotspot:tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Add LogCompilation support Thanks! I'll integrate once GHA clears. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23980#issuecomment-2718719599 From duke at openjdk.org Wed Mar 12 19:19:08 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 12 Mar 2025 19:19:08 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v7] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Made the intrinsics test separate from the pure java test. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23860/files - new: https://git.openjdk.org/jdk/pull/23860/files/f65ef7c4..aa2fdf2d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=05-06 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From shade at openjdk.org Wed Mar 12 19:28:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 19:28:03 GMT Subject: Integrated: 8351142: Add JFR monitor deflation and statistics events In-Reply-To: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: <9WCqIeT-1fqKoscdY4Fz7zkg4R3laMEbysHKuP2_bbk=.78b618af-32d4-4ce5-bb58-da36250e4b06@github.com> On Tue, 4 Mar 2025 14:47:09 GMT, Aleksey Shipilev wrote: > We already have JFR JavaMonitorInflate event, which tells when the monitor is inflated. We are missing JavaMonitorDeflate event, which would tell us when the monitor is deflated. This makes it hard to see the monitor lifecycle, and/or estimate the population of currently inflated monitors. I believe we should add JavaMonitorDeflate event. It would also be useful to have the statistics for the number of currently used/deflating monitors. Deflation event alone would require post-processing to investigate this, so it would be good to have the statistics event as well. > > This would also replace two of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > Monitor deflation is done asynchronously in `MonitorDeflationThread`, so the additional overhead of recording the deflation events would likely be performance neutral. We still only enable the statistics event by default to be on a safer side. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` This pull request has now been integrated. Changeset: 895f64a1 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/895f64a18d7c752332ef9255c0b118bf25bdbb90 Stats: 286 lines in 13 files changed: 275 ins; 6 del; 5 mod 8351142: Add JFR monitor deflation and statistics events Reviewed-by: egahlin, dholmes, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/23900 From shade at openjdk.org Wed Mar 12 19:47:58 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 19:47:58 GMT Subject: Integrated: 8351640: Print reason for making method not entrant In-Reply-To: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> References: <_XHdskC5Q0n4cwspFV97uiUyS2HsWDSZAK-YkGGCUIA=.8e86e6e2-9c46-440f-aca5-efbc54475f29@github.com> Message-ID: On Tue, 11 Mar 2025 11:36:59 GMT, Aleksey Shipilev wrote: > A simple quality of life improvement. We are studying compiler dynamics in Leyden, and it would be convenient to know why the particular methods are marked as not entrant. We just need to pass the extra string argument to `nmethod::make_not_entrant` and print it out. > > Sample log excerpt for mainline: > > > $ grep com.sun.tools.javac.util.IntHashTable::lookup print-compilation.log > 987 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 1019 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 1024 780 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used > 4995 877 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: uncommon trap > 5287 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 6615 5472 4 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) > 6626 3734 3 com.sun.tools.javac.util.IntHashTable::lookup (100 bytes) made not entrant: not used > > > You can now clearly see the method lifecycle. 1 second in app lifetime, the method was initially compiled at level 3. Shortly after, it got compiled at level 4, turning level 3 method unused. 4 seconds later, level 4 method encountered uncommon trap, so we are back to level 3. After 1.3 seconds more, the final compilation at level 4 completed, and second level 3 compilation was removed as unused. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot:tier1` > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: 930455b5 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/930455b59608b547017c9649efeb6bd381340c34 Stats: 68 lines in 16 files changed: 35 ins; 0 del; 33 mod 8351640: Print reason for making method not entrant Co-authored-by: Vladimir Ivanov Reviewed-by: vlivanov, kvn ------------- PR: https://git.openjdk.org/jdk/pull/23980 From dlong at openjdk.org Wed Mar 12 19:49:55 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 12 Mar 2025 19:49:55 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v10] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 15:55:00 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > changed assert in align_up Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23711#pullrequestreview-2679699216 From eastig at amazon.co.uk Wed Mar 12 21:45:49 2025 From: eastig at amazon.co.uk (Astigeevich, Evgeny) Date: Wed, 12 Mar 2025 21:45:49 +0000 Subject: [External] : RFD: Grouping hot code in CodeCache In-Reply-To: <894e661e-6532-463f-b96c-3f17c853bb89@oracle.com> References: <1B0C3138-761B-4DB0-8A98-977C6FC40178@amazon.co.uk> <2623f909-bb91-4450-bc05-d9181ba3abcb@oracle.com> <681195aa-50a9-45ad-abe5-6e6e2d164b01@oracle.com> <921F549A-921F-4F3D-A481-43D0F2F25183@amazon.co.uk> <894e661e-6532-463f-b96c-3f17c853bb89@oracle.com> Message-ID: Hi Vladimir, > But do you still want to continue work on next RFE and linked sub-RFEs?: > https://bugs.openjdk.org/browse/JDK-8326205 > Please, clarify which ones you want to upstream? I have updated JDK-8326205. It will be the main RFE to cover upstream works: detection of nmethods, a code heap for them, and maintenance of the code heap. The linked RFEs are those needed for JDK-8326205. They have PRs published. JDK-8316694 "Implement relocation of nmethod within CodeCache" can be useful for other cases, e.g. we can keep C1 nmethods for big C2 nmethods to switch back to them when the C2 methods get deoptimized. We will not need to go through reinterpretation and C1 compilation again. Thanks, Evgeny ?On 11/03/2025, 19:23, "Vladimir Kozlov" > wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On 3/10/25 3:55 PM, Astigeevich, Evgeny wrote: > Hi Vladimir, > >> I don't like manual part of this - providing list of hot methods which >> should be collocated. > > It looks like I was not clear in my first email and miscommunication happened. > I am sorry. I provided it to share what we tried and what lessons we learned, especially how it is complicated. > We have no intent to upstream list-based solutions. Okay, NP. But do you still want to continue work on next RFE and linked sub-RFEs?: https://bugs.openjdk.org/browse/JDK-8326205 Please, clarify which ones you want to upstream? > >> Sometime ago we had concept of Code > > Thank for sharing. If I remember correctly it uses deoptimization to remove aging code which means recompilation. > > BTW, I found https://openjdk.org/jeps/8350338 " Cooperative JFR Sampling". > I see it has things we want in our implementation. Yes, it started moving. Thanks, Vladimir K > > Thanks, > Evgeny > > On 08/03/2025, 02:03, "Vladimir Kozlov" >> wrote: > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > > > > On 3/7/25 3:47 PM, Astigeevich, Evgeny wrote: >> Hi Vladimir, >> >> Thank you for the feedback. >> >>> My concern is that it will complicate VM existing code for not >>> significant benefits in real production environment. > > > To clarify. > > > I don't like manual part of this - providing list of hot methods which > should be collocated. > > > I am fine to have special segment for special C2 compiled code. We will > have one for some AOT code in Leyden. > > > Move code in CodeCache to make it more dense is also fine. > > >> >> I think it won't complicate the existing code: >> - Adding a code heap is ~50 lines of code, mostly in CodeCache::initialize_heaps. >> - Relocating nmethods, according to PR[1], is ~300 lines of code. >> - A grouping thread is simple and isolated. It will go through Java threads checking their last frame(s) and recording seen nmethods. It should have less code than the sweeper which was ~700 lines. >> >> I think it's better to wait for PoC to see the complexity. >> >>> What improvements your experiments in real production runs shows? And >>> which JDK version you used for that? >> >> In production we are using internally 17 (static lists of methods) and 21 (dynamic lists of methods). >> Improvements are in a range of 5% - 15%. They depend on how big CPU load is: the more CPU load the bigger improvement. > > > Good. > > >> >>> As you know most of nmethod's metadata is moved from CodeCache. >>> ... >>> After that the code will be a lot more compact in CodeCache. Code sparsity >>> should be less issue then. >> >> Yes, removing non-code from nmethod will improve code density. This means in a code region we will have more code vs non-code. >> CPU instruction caches will like this. >> >> As I wrote in a comment to benchmark PR [2], Neoverse operates in code regions. For Neoverse it's more important to have as less code regions with active nmethods as possible. >> >> We are aware of cases when CodeCache usage is between 512M - 1G. The mentioned changes won't help in those cases. >> If I remember, no public benchmarks have demonstrated improvements from non-code moved away from nmethod. >> >> Since the removal of the Sweeper, GC is in charge of cleaning CodeCache. We've seen cases when GC was triggered often because of allocation pressure on CodeCache. >> For such cases, a recommended workaround is to increase the size of CodeCache from default 240M up to 512M. In such cases actively used nmethods will more likely be sparse. > > > Hmm, may be we should restore counters decay for this case to prevent > warm methods from compiling and polluting CodeCache and keep it small. > > >> >>> It would be nice if you redo your production experiments after that. >> >> Due to the complexity of customer's application we cannot run it on OpenJDKTip. It has thousand dependencies. We will need to move them on OpenJDKTip. >> I think it would be difficult to backport the mentioned changes to 21 > > > Understood. > > >> >>> I understand that we can still have sparsity due to "warm" nmethods and >>> C1 compiled code mixed with "hot" C2 nmethods. >> >> Customers having issues with big CodeCache on Graviton usually turn off tiered compilation to reduce far jumps/calls. BTW, this is another argument for identifying active nmethods and grouping them together: it should reduce/eliminate far jumps/calls. >> With small CodeCache mix of C1 and C2 nmethods is not an issue. >> >>> Can we simply use a separate CodeCache's segment for all >>> C2 "hot" (we can specify frequency flag to determine what "hot" means) >>> methods regardless when they are compiled. >> >> I did not get the idea. We already have the non-profiled segment where C2 code is put. Do you mean that at the compilation time some methods are put in the regular non-profile segment and some in the specific non-profile segment? > > > Yes, I meant separate segments for hot and warm methods, both are c2 > compiled code. > > > It would still mix all 3 cases you pointed because compilation policy > based mostly on what happened during startup. So it may be not good idea. > > >> What we've seen that methods profiles keep changing. >> There are the following cases: >> 1. C2 methods used most of the time: their profile can stay the same or can get hotter. >> 2. C2 methods used periodically: actively used, not used, actively used and so on >> 3. C2 methods used actively during some time and never used after >> >> Currently GC identifies cases #3 and some cases #2, aka cold code. The percentage of methods case #1 is ~10% - 20%. >> If we have 100M of C2 code, only 10M - 20M will be actively used. If we get unlucky, those 10M-20M could be spread across CodeCache and cause CPU stalls. >> How can we identify those 10%-20% of methods at compilation time? > > > I agree that it will be hard to determine that during compilation. > We need some statistic after we compiled to find such methods. > > > Sometime ago we had concept of Code Aging (removed after Sweeper was > removed): > https://urldefense.com/v3/__https://github.com/vnkozlov/jdk17u-dev/commit/54db2c2d612c573f91f69b7b387b43a8e1c9d563__;!!ACWV5N9M2RV99hQ!O67hmASdRidjl2V1_KDN8iqwvBiKycfefSp1XhUOPa_AWGAFwGDX_ojltPiZzV392Cn8T0t-le93_YbXQ4nkfN8$ > > > > It added counter on nmethod entry to keep track if it is alive. We can > use something similar to track how frequently nmethod is used. > > > Erik Osterlund also had prototype in Leyden for call stack profiling by > VM itself to find most used hot methods during training run. > > > Thanks, > Vladimir. > > >> >> BTW, I think the separate hot code heap might simplify flushing cold code. Everything not in the hot code heap can automatically assumed cold. >> >> Thanks, >> Evgeny >> >> [1]: https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/23573__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBh_vDxrlg$ > >> [2]: https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/23831 *issuecomment-2705085399__;Iw!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhJyHApiM$ >> >> On 06/03/2025, 22:41, "Vladimir Kozlov" > >>> wrote: >> >> >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >> >> >> >> >> >> >> Hi Evgeny, >> >> >> My concern is that it will complicate VM existing code for not >> significant benefits in real production environment. >> >> >> What improvements your experiments in real production runs shows? And >> which JDK version you used for that? >> >> >> As you know most of nmethod's metadata is moved from CodeCache. And >> Boris Ulasevich will move the final part (relocation info) soon. After >> that the code will be a lot more compact in CodeCache. Code sparsity >> should be less issue then. >> >> >> It would be nice if you redo your production experiments after that. >> >> >> I understand that we can still have sparsity due to "warm" nmethods and >> C1 compiled code mixed with "hot" C2 nmethods. I think compilation >> policy has heuristic to detect "warm" method (time intervals between >> invocations). Can we simply use a separate CodeCache's segment for all >> C2 "hot" (we can specify frequency flag to determine what "hot" means) >> methods regardless when they are compiled. Then you don't need to create >> list or do anything special for them. Most likely we will waste more >> space in CodeCache but it could be conditional under flag which you >> already proposed in separate segment RFE. >> >> >> Thanks, >> Vladimir K >> >> >> On 3/5/25 10:41 AM, Astigeevich, Evgeny wrote: >>> Hi Vladimir, >>> >>> This is JDK-8326205: Implement grouping hot nmethods in CodeCache. >>>> As I managed to synthesize a benchmark >> (https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$ > > > >>> pull/23831 >>> pull/23831__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- >>> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- >>> imnfmmpfw$>) to demonstrate performance impact of sparse code, I?d like >>> to discuss a possible solution of the sparse code. >>> >>> High level, a solution is: >>> >>> * Detect hot code. >>> * Group hot code. >>> * Maintain grouped code. >>> >>> Downstream we tried two approaches: >>> >>> * *Static lists of methods (compile command):* Identify frequently >>> used (hot) methods using test runs and provide static method lists >>> to JVM in production. When JVM compiles a Java method and the method >>> is on the list, JVM puts the code into to a designated code heap >>> (HotCodeHeap). >>> * *Dynamic lists of methods (compiler directives):* Profile an >>> application in production and dynamically relocate identified hot >>> methods to HotCodeHeap. Relocation was implemented with recompilation. >>> >>> The main advantage of static lists is zero profiling overhead in >>> production. We do all profiling and analysis in test runs. Its problems are: >>> >>> * *Training Run Accuracy*: We need training runs to have execution >>> paths closely mimicking production environments. Otherwise we put >>> wrong methods into HotCodeHeap. >>> * *Method List Maintenance:* We need to rerun training to regenerate >>> lists when application code changes. Training runs are expensive and >>> time-consuming. They require long runs to guarantee we see all major >>> execution paths. Updating lists in production can be as complex as >>> application deployment >>> * *Method Placement Limitations:* Methods marked for HotCodeHeap are >>> permanently placed into HotCodeHeap. No mechanism to remove methods >>> that become less frequently used. >>> >>> We addressed these problems with dynamic lists of methods. We >>> implemented a Java agent that runs within the same JVM to dynamically >>> detect and manage hot Java methods without prior method identification. >>> The agent detects hot methods using JFR. The agent manages hot Java >>> methods in HotCodeHeap with compiler directives. A new compiler >>> directive marks methods with dynamic states ("hot" or "cold"). Methods >>> marked by the ?hot? state are recompiled and placed in HotCodeHeap. >>> Methods marked by the ?cold? state are eventually removed from HotCodeHeap. >>> >>> Problems of this approach are: >>> >>> * It requires specific, complex modifications to compiler directive >>> support: recompilation of Java methods affected by compiler >>> directives changes. This functionality is unique to Java agent >>> implementation and has limited potential for broader use. >>> * The agent cannot guarantee Java methods are moved to/removed from >>> the HotCodeHeap because updates of compiler directives can fail. >>> * The agent knows nothing about compiled code, e.g. whether it?s C1 or >>> C2 compiled, code size, profile. This data can useful for deciding >>> to move or not to move to HotCodeHeap. >>> * Recompilations, especially C2, are expensive. Having many of them >>> can cause performance issues. Also recompiled code might differ from >>> the code we have detected as ?hot?. >>> >>> Running these two approaches in production we learned: >>> >>> * We detect 95% of actively used methods withing the first 30 minutes >>> of an application run. This is with JFR profiling configured: 90 >>> seconds session duration, sampling each 11 ms, 8 minutes between >>> profiling sessions. We can find actively used methods faster if we >>> reduce a pause between profiling sessions and sampling period. >>> However it will increase the profiling overhead and affect >>> application performance. With the current configuration, the >>> profiling overhead is between 1% - 2%. >>> * A set of actively used methods gets into the steady state (no new >>> methods added to, no methods removed from) within the first 60 minutes. >>> * Static lists, when created from runs close to production, have 80% - >>> 90% methods always in use. This does not change over time. >>> * Predicting the size of HotCodeHeap is difficult, especially with >>> dynamic lists. >>> >>> We want to have grouping of hot method functionality as a part Hotspot >>> JVM. We will group only C2 compiled methods. We can group JVMCI compiled >>> methods, e.g. Graal, if needed. We need profiling precise enough to >>> detect major Java methods. Low overhead is more important than precision. >>> >>> We think we can have a solution which does not require a lot of code: >>> >>> * Detect hot code: we can an implementation based on the Sweeper: >>> https://urldefense.com/v3/__https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhMWPimyo$ > > > >>> runtime/sweeper.hpp >>> openjdk/jdk17u/blob/master/src/hotspot/share/runtime/ >>> sweeper.hpp__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- >>> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- >>> imVr_axpo$>. We will use the handshakes mechanism, what the Sweeper >>> used, to detect nmethods on the top of thread stacks. >>> * Group hot code: we have a draft PR https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$ > > > >>> pull/23573 >>> jdk/pull/23573__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI- >>> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb- >>> imcL9xtiE$>. It implements relocation of nmethods within CodeCache. >>> * Maintain grouped code: we will add an additional code heap where hot >>> nmethods will be relocated to. >>> >>> What do you think about this approach? Are there other possible solutions? >>> >>> Thanks, >>> >>> Evgeny A. >>> >>> >>> >>> >>> Amazon Development Centre (London) Ltd.Registered in England and Wales >>> with registration number 04543232 with its registered office at 1 >>> Principal Place, Worship Street, London EC2A 2FA, United Kingdom. >>> >>> >> >> >> >> >> >> >> >> >> Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. >> >> > > > > > > > > > Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. > > Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. From fyang at openjdk.org Thu Mar 13 01:44:58 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Mar 2025 01:44:58 GMT Subject: RFR: 8318220: RISC-V: C2 ReverseI [v3] In-Reply-To: <6lGIjtZULnE1TEzlsLaNNO9CKe1PRbgPO2KAvUbvimU=.7e813b51-5a8f-4c84-af8f-ee9342844491@github.com> References: <6lGIjtZULnE1TEzlsLaNNO9CKe1PRbgPO2KAvUbvimU=.7e813b51-5a8f-4c84-af8f-ee9342844491@github.com> Message-ID: On Wed, 12 Mar 2025 13:19:44 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch to add ReverseI and ReverseIL intrinsic on riscv? >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > not enable Zbkb automatically Looks fine to me. Thanks for the update. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23963#pullrequestreview-2680233193 From rehn at openjdk.org Thu Mar 13 07:55:55 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 13 Mar 2025 07:55:55 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar [v7] In-Reply-To: References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Wed, 12 Mar 2025 17:29:40 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. >> >> ## Performance >> >> data >> >> Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 >> Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 >> Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 >> Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 >> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 >> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 >> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 >> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.81... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > refine switch Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23844#pullrequestreview-2680853803 From mli at openjdk.org Thu Mar 13 08:20:05 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 13 Mar 2025 08:20:05 GMT Subject: Integrated: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar In-Reply-To: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> References: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Message-ID: On Fri, 28 Feb 2025 14:34:47 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. > > ## Performance > > data > > Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1 > Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011 > Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001 > Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001 > Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909 > Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91 > Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129 > Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001 > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898 > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.818 | 250.761 | 5191.12 | 64.598 | ns/op | 17.639 > Fl... This pull request has now been integrated. Changeset: a33b1f7f Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/a33b1f7f640e0a9e76d2a686734e472a87d809bf Stats: 447 lines in 13 files changed: 396 ins; 2 del; 49 mod 8345298: RISC-V: Add riscv backend for Float16 operations - scalar Reviewed-by: rehn, fyang ------------- PR: https://git.openjdk.org/jdk/pull/23844 From mli at openjdk.org Thu Mar 13 08:31:30 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 13 Mar 2025 08:31:30 GMT Subject: RFR: 8318220: RISC-V: C2 ReverseI [v4] In-Reply-To: References: Message-ID: <8GCHwRel0a2PCahmHnwpQCAYiLiBj6dseJgyyUSn1jY=.bf93b8d1-fbe9-409b-b2e8-47a07bf9d672@github.com> > Hi, > Can you help to review this patch to add ReverseI and ReverseIL intrinsic on riscv? > > Thanks! Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Merge branch 'master' into reverse-I-L - not enable Zbkb automatically - refine tests - use srai instead of srli - clean test - clean test - clean test - clean - add tests - clean - ... and 5 more: https://git.openjdk.org/jdk/compare/a33b1f7f...61bef248 ------------- Changes: https://git.openjdk.org/jdk/pull/23963/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23963&range=03 Stats: 323 lines in 9 files changed: 323 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23963.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23963/head:pull/23963 PR: https://git.openjdk.org/jdk/pull/23963 From fyang at openjdk.org Thu Mar 13 08:31:30 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Mar 2025 08:31:30 GMT Subject: RFR: 8318220: RISC-V: C2 ReverseI [v4] In-Reply-To: <8GCHwRel0a2PCahmHnwpQCAYiLiBj6dseJgyyUSn1jY=.bf93b8d1-fbe9-409b-b2e8-47a07bf9d672@github.com> References: <8GCHwRel0a2PCahmHnwpQCAYiLiBj6dseJgyyUSn1jY=.bf93b8d1-fbe9-409b-b2e8-47a07bf9d672@github.com> Message-ID: On Thu, 13 Mar 2025 08:28:29 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch to add ReverseI and ReverseIL intrinsic on riscv? >> >> Thanks! > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into reverse-I-L > - not enable Zbkb automatically > - refine tests > - use srai instead of srli > - clean test > - clean test > - clean test > - clean > - add tests > - clean > - ... and 5 more: https://git.openjdk.org/jdk/compare/a33b1f7f...61bef248 Still good to me. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23963#pullrequestreview-2680942262 From mli at openjdk.org Thu Mar 13 08:31:30 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 13 Mar 2025 08:31:30 GMT Subject: RFR: 8318220: RISC-V: C2 ReverseI [v3] In-Reply-To: References: <6lGIjtZULnE1TEzlsLaNNO9CKe1PRbgPO2KAvUbvimU=.7e813b51-5a8f-4c84-af8f-ee9342844491@github.com> Message-ID: On Thu, 13 Mar 2025 01:41:57 GMT, Fei Yang wrote: > Looks fine to me. Thanks for the update. Thank you! @RealFYang Just merged some conflict, can you have another look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23963#issuecomment-2720346628 PR Comment: https://git.openjdk.org/jdk/pull/23963#issuecomment-2720355606 From cnorrbin at openjdk.org Thu Mar 13 08:58:53 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 13 Mar 2025 08:58:53 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v10] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 15:55:00 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > changed assert in align_up Thank you everyone for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23711#issuecomment-2720472805 From duke at openjdk.org Thu Mar 13 08:58:54 2025 From: duke at openjdk.org (duke) Date: Thu, 13 Mar 2025 08:58:54 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v10] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 15:55:00 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > changed assert in align_up @caspernorrbin Your change (at version 5dc102eab58692dd9d03b2d122be54235cd57d74) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23711#issuecomment-2720475358 From stuefe at openjdk.org Thu Mar 13 09:11:06 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 13 Mar 2025 09:11:06 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v2] In-Reply-To: <4lV6ssHh71lUtGEHsxkKoPDz7GrcZrmUvKXGsfjzbE4=.e3072dfd-f189-4d55-9c7d-18991b4744d6@github.com> References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> <4lV6ssHh71lUtGEHsxkKoPDz7GrcZrmUvKXGsfjzbE4=.e3072dfd-f189-4d55-9c7d-18991b4744d6@github.com> Message-ID: On Wed, 12 Mar 2025 17:16:09 GMT, Ioi Lam wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> skip test if we have no COH archive > > src/hotspot/share/cds/metaspaceShared.cpp line 1431: > >> 1429: #ifdef _LP64 >> 1430: if (Metaspace::using_class_space()) { >> 1431: assert(prot_zone_size > 0 && > > This code assumes that `prot_zone_size > 0`, but we have other code that checks `if (prot_zone_size > 0)`. Should the "if" be changed to asserts? (prot_zone_size > 0) holds true if we are using class space. The other occurrences are in paths that are also hit for non-class space. But here, we know we are using class space, since we are in a `if (Metaspace::using_class_space())` condition. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23912#discussion_r1993075045 From cnorrbin at openjdk.org Thu Mar 13 09:38:59 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 13 Mar 2025 09:38:59 GMT Subject: Integrated: 8346916: [REDO] align_up has potential overflow In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 10:48:26 GMT, Casper Norrbin wrote: > Hi everyone, > > The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. > > The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. This pull request has now been integrated. Changeset: 86860cac Author: Casper Norrbin Committer: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/86860cac044e6f464732753670b14a80c1fef438 Stats: 132 lines in 6 files changed: 95 ins; 29 del; 8 mod 8346916: [REDO] align_up has potential overflow Reviewed-by: ayang, kbarrett, dlong ------------- PR: https://git.openjdk.org/jdk/pull/23711 From shade at openjdk.org Thu Mar 13 09:44:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Mar 2025 09:44:04 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v5] In-Reply-To: References: Message-ID: <6HSZ5L4LjfEDdiQeKftFHlEMv3iRX68H2ZouhnQyV2c=.40e3a8f1-acbf-443f-a73f-06a55b0a8929@github.com> > We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. > > Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge branch 'master' into JDK-8351187-jfr-monitor-notify - Only emit event when notification happened - Merge branch 'master' into JDK-8351187-jfr-monitor-notify - Rewrite test to RecordingStream - Drop threshold to 0ms - Merge branch 'master' into JDK-8351187-jfr-monitor-notify - Disable by default - Fix ------------- Changes: https://git.openjdk.org/jdk/pull/23901/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23901&range=04 Stats: 168 lines in 7 files changed: 162 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23901/head:pull/23901 PR: https://git.openjdk.org/jdk/pull/23901 From liach at openjdk.org Thu Mar 13 11:20:10 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 13 Mar 2025 11:20:10 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: <5Em3jYkKNERI9WrgRe9zfbEVLK8Gca46V4nL98N2Xmw=.54f09707-856e-4b26-9a03-0ebd17a40b6b@github.com> On Mon, 10 Mar 2025 18:11:23 GMT, Per Minborg wrote: > Implement JEP 502. > > The PR passes tier1-tier3 tests. FYI we don't usually drop the benchmark scores in the PR description; we usually leave them in comments to indicate which revision the bench results apply to. src/hotspot/share/ci/ciField.cpp line 255: > 253: static bool trust_final_non_static_fields_of_type(Symbol* signature) { > 254: return signature == vmSymbols::java_lang_StableValue_signature() || > 255: signature == vmSymbols::java_lang_StableValue_array_signature(); This is dubious - a user can declare a `final StableValue[] array;` and modify the array elements, which is totally compliant to the language and the VM rules. Don't know what this serves. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2711648215 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1987920134 From pminborg at openjdk.org Thu Mar 13 11:20:10 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 11:20:10 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) Message-ID: Implement JEP 502. The PR passes tier1-tier3 tests. ------------- Commit messages: - Use acquire semantics for reading rather than volatile semantics - Add missing null check - Simplify handling of sentinel, wrap, and unwrap - Fix JavaDoc issues - Fix members in StableEnumFunction - Address some comments in the PR - Merge branch 'master' into implement-jep502 - Revert change - Fix copyright issues - Update JEP number - ... and 231 more: https://git.openjdk.org/jdk/compare/4cf63160...09ca44e6 Changes: https://git.openjdk.org/jdk/pull/23972/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351565 Stats: 3980 lines in 30 files changed: 3949 ins; 18 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From pminborg at openjdk.org Thu Mar 13 11:20:11 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 11:20:11 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 18:11:23 GMT, Per Minborg wrote: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Here are the current benchmarks for Mac M1: Benchmark Mode Cnt Score Error Units StableFunctionBenchmark.function avgt 10 4.071 ? 0.252 ns/op StableFunctionBenchmark.stable avgt 10 4.107 ? 0.065 ns/op StableFunctionBenchmark.staticIntFunction avgt 10 2.688 ? 1.647 ns/op StableFunctionBenchmark.staticStable avgt 10 1.708 ? 0.278 ns/op StableIntFunctionBenchmark.intFunction avgt 10 1.528 ? 0.040 ns/op StableIntFunctionBenchmark.stable avgt 10 1.515 ? 0.019 ns/op StableIntFunctionBenchmark.staticIntFunction avgt 10 1.047 ? 0.023 ns/op StableIntFunctionBenchmark.staticStable avgt 10 1.056 ? 0.045 ns/op StableSupplierBenchmark.stable avgt 10 1.411 ? 0.127 ns/op StableSupplierBenchmark.supplier avgt 10 1.676 ? 0.055 ns/op StableValueBenchmark.atomic avgt 10 1.404 ? 0.061 ns/op StableValueBenchmark.dcl avgt 10 1.398 ? 0.037 ns/op StableValueBenchmark.refSupplier avgt 10 0.498 ? 0.077 ns/op StableValueBenchmark.stable avgt 10 1.406 ? 0.053 ns/op StableValueBenchmark.stableNull avgt 10 1.279 ? 0.062 ns/op StableValueBenchmark.staticAtomic avgt 10 1.228 ? 0.060 ns/op StableValueBenchmark.staticDcl avgt 10 0.342 ? 0.005 ns/op StableValueBenchmark.staticHolder avgt 10 0.342 ? 0.006 ns/op StableValueBenchmark.staticStable avgt 10 0.348 ? 0.015 ns/op `` ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2713007259 From pminborg at openjdk.org Thu Mar 13 11:20:11 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 11:20:11 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: <5Em3jYkKNERI9WrgRe9zfbEVLK8Gca46V4nL98N2Xmw=.54f09707-856e-4b26-9a03-0ebd17a40b6b@github.com> References: <5Em3jYkKNERI9WrgRe9zfbEVLK8Gca46V4nL98N2Xmw=.54f09707-856e-4b26-9a03-0ebd17a40b6b@github.com> Message-ID: On Mon, 10 Mar 2025 19:45:53 GMT, Chen Liang wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > src/hotspot/share/ci/ciField.cpp line 255: > >> 253: static bool trust_final_non_static_fields_of_type(Symbol* signature) { >> 254: return signature == vmSymbols::java_lang_StableValue_signature() || >> 255: signature == vmSymbols::java_lang_StableValue_array_signature(); > > This is dubious - a user can declare a `final StableValue[] array;` and modify the array elements, which is totally compliant to the language and the VM rules. Don't know what this serves. Fair comment. We should at least remove the array signature. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988609781 From duke at openjdk.org Thu Mar 13 11:20:13 2025 From: duke at openjdk.org (Luca Kellermann) Date: Thu, 13 Mar 2025 11:20:13 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 18:11:23 GMT, Per Minborg wrote: > Implement JEP 502. > > The PR passes tier1-tier3 tests. src/java.base/share/classes/java/lang/StableValue.java line 79: > 77: * logger.trySet(Logger.create(Component.class)); > 78: * } > 79: * return logger.orThrow(); Suggestion: * return logger.orElseThrow(); src/java.base/share/classes/java/lang/StableValue.java line 127: > 125: * evaluated only once, even when {@code logger.orElseSet()} is invoked concurrently. > 126: * This property is crucial as evaluation of the supplier may have side effects, > 127: * e.g., the call above to {@code Logger.getLogger()} may result in storage resources Suggestion: * e.g., the call above to {@code Logger.create()} may result in storage resources src/java.base/share/classes/java/lang/StableValue.java line 344: > 342: * {@linkplain java.lang.ref##reachability reachable} stable values will hold their set > 343: * content perpetually. > 344: *

Should the original functions / mappers (for stable functions and collections) also stay reachable? Kotlin's [`Lazy`](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin/-lazy/) [nulls out](https://github.com/JetBrains/kotlin/blob/c6f337283d59fcede75954eebaa589ad1b479aea/libraries/stdlib/jvm/src/kotlin/util/LazyJVM.kt#L70-L89) the initializer function when it's no longer needed. src/java.base/share/classes/java/lang/StableValue.java line 423: > 421: * {@snippet lang=java: > 422: * if (stable.isSet()) { > 423: * return stable.get(); Suggestion: * return stable.orElseThrow(); src/java.base/share/classes/java/lang/StableValue.java line 547: > 545: IntFunction original) { > 546: if (size < 0) { > 547: throw new IllegalArgumentException(); This exceptions isn't documented, same for `StableValue.list()` src/java.base/share/classes/jdk/internal/lang/stable/StableEnumFunction.java line 112: > 110: final Class enumType = (Class)inputs.iterator().next().getClass(); > 111: return (Function) new StableEnumFunction(enumType, min, StableValueFactories.array(size), (Function) original); > 112: } If `inputs` contains the enumuration constants with ordinals 0 and 2, wouldn't this code wrongly cause the enumeration constant with ordinal 1 to be an allowed input? src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 141: > 139: ? "(this StableValue)" > 140: : "StableValue" + renderWrapped(t); > 141: } Are deeper cycles of concern? I was thinking of this: var a = StableValue.of(); var b = StableValue.of(); a.trySet(b); b.trySet(a); System.out.println(a); This would solve deeper cycles for `StableValueImpl`: @Override public String toString() { final StringBuilder sb = new StringBuilder("StableValue"); int depth = 0; Object t = value; while (t instanceof StableValueImpl s) { if (s == this) { t = "(this StableValue)"; break; } sb.append("[StableValue"); depth++; t = s.value; } sb.append(renderWrapped(t)); while (depth-- > 0) sb.append(']'); return sb.toString(); } This might also apply to stable functions and collections, I haven't thought it through for them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1989143787 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1989165612 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1989265489 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1989377859 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1989504117 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988064795 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988092230 From liach at openjdk.org Thu Mar 13 11:20:13 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 13 Mar 2025 11:20:13 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: <4B5XTEOv_30znhklVd9ymC0F4bmdn1bpYR-fYwtpvtY=.f0a2f7d9-0345-4893-a3f1-8559496b3c56@github.com> On Tue, 11 Mar 2025 13:22:20 GMT, Luca Kellermann wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > src/java.base/share/classes/java/lang/StableValue.java line 344: > >> 342: * {@linkplain java.lang.ref##reachability reachable} stable values will hold their set >> 343: * content perpetually. >> 344: *

> > Should the original functions / mappers (for stable functions and collections) also stay reachable? Kotlin's [`Lazy`](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin/-lazy/) [nulls out](https://github.com/JetBrains/kotlin/blob/c6f337283d59fcede75954eebaa589ad1b479aea/libraries/stdlib/jvm/src/kotlin/util/LazyJVM.kt#L70-L89) the initializer function when it's no longer needed. The nulling out is only safe if the write to the value is visible when a nulled-out function is visible. I think SV can ensure this, but an implementation can easily go wrong trying to do this. (Also `orElseSet` does not NPE if the incoming supplier is null but the value is bound) > src/java.base/share/classes/jdk/internal/lang/stable/StableEnumFunction.java line 112: > >> 110: final Class enumType = (Class)inputs.iterator().next().getClass(); >> 111: return (Function) new StableEnumFunction(enumType, min, StableValueFactories.array(size), (Function) original); >> 112: } > > If `inputs` contains the enumuration constants with ordinals 0 and 2, wouldn't this code wrongly cause the enumeration constant with ordinal 1 to be an allowed input? Indeed, a bit set predicate can be used to check input validity if it is necessary - I think for enums, using a `StableFunction.ofEnum` dedicated API might be better just because `StableValue` can access `Class.getEnumConstantsShared` easily. > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 141: > >> 139: ? "(this StableValue)" >> 140: : "StableValue" + renderWrapped(t); >> 141: } > > Are deeper cycles of concern? I was thinking of this: > > var a = StableValue.of(); > var b = StableValue.of(); > a.trySet(b); > b.trySet(a); > System.out.println(a); > > > This would solve deeper cycles for `StableValueImpl`: > > @Override > public String toString() { > final StringBuilder sb = new StringBuilder("StableValue"); > int depth = 0; > Object t = value; > while (t instanceof StableValueImpl s) { > if (s == this) { > t = "(this StableValue)"; > break; > } > sb.append("[StableValue"); > depth++; > t = s.value; > } > sb.append(renderWrapped(t)); > while (depth-- > 0) sb.append(']'); > return sb.toString(); > } > > This might also apply to stable functions and collections, I haven't thought it through for them. I think the default Object.toString impl is better here - the type `StableValue` shouldn't really be exposed in a user API endpoint and is just a utility for the users. No need to bikeshed on this mostly useless functionality. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1989965200 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988159371 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988154881 From pminborg at openjdk.org Thu Mar 13 11:20:13 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 11:20:13 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: <4B5XTEOv_30znhklVd9ymC0F4bmdn1bpYR-fYwtpvtY=.f0a2f7d9-0345-4893-a3f1-8559496b3c56@github.com> References: <4B5XTEOv_30znhklVd9ymC0F4bmdn1bpYR-fYwtpvtY=.f0a2f7d9-0345-4893-a3f1-8559496b3c56@github.com> Message-ID: On Tue, 11 Mar 2025 19:04:39 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/StableValue.java line 344: >> >>> 342: * {@linkplain java.lang.ref##reachability reachable} stable values will hold their set >>> 343: * content perpetually. >>> 344: *

>> >> Should the original functions / mappers (for stable functions and collections) also stay reachable? Kotlin's [`Lazy`](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin/-lazy/) [nulls out](https://github.com/JetBrains/kotlin/blob/c6f337283d59fcede75954eebaa589ad1b479aea/libraries/stdlib/jvm/src/kotlin/util/LazyJVM.kt#L70-L89) the initializer function when it's no longer needed. > > The nulling out is only safe if the write to the value is visible when a nulled-out function is visible. I think SV can ensure this, but an implementation can easily go wrong trying to do this. (Also `orElseSet` does not NPE if the incoming supplier is null but the value is bound) This is something we experimented with a bit in the past. It isn't easy to do in the general case. There are pros (the function and its resources that can be collected) and cons (e.g., mutability, visibility, complexity, etc.) with this. >> src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 141: >> >>> 139: ? "(this StableValue)" >>> 140: : "StableValue" + renderWrapped(t); >>> 141: } >> >> Are deeper cycles of concern? I was thinking of this: >> >> var a = StableValue.of(); >> var b = StableValue.of(); >> a.trySet(b); >> b.trySet(a); >> System.out.println(a); >> >> >> This would solve deeper cycles for `StableValueImpl`: >> >> @Override >> public String toString() { >> final StringBuilder sb = new StringBuilder("StableValue"); >> int depth = 0; >> Object t = value; >> while (t instanceof StableValueImpl s) { >> if (s == this) { >> t = "(this StableValue)"; >> break; >> } >> sb.append("[StableValue"); >> depth++; >> t = s.value; >> } >> sb.append(renderWrapped(t)); >> while (depth-- > 0) sb.append(']'); >> return sb.toString(); >> } >> >> This might also apply to stable functions and collections, I haven't thought it through for them. > > I think the default Object.toString impl is better here - the type `StableValue` shouldn't really be exposed in a user API endpoint and is just a utility for the users. No need to bikeshed on this mostly useless functionality. The `toString()` function for stable value is inspired by `Optional` and some of the collections. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1990919074 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988637468 From duke at openjdk.org Thu Mar 13 11:20:13 2025 From: duke at openjdk.org (Luca Kellermann) Date: Thu, 13 Mar 2025 11:20:13 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: <4B5XTEOv_30znhklVd9ymC0F4bmdn1bpYR-fYwtpvtY=.f0a2f7d9-0345-4893-a3f1-8559496b3c56@github.com> Message-ID: On Wed, 12 Mar 2025 08:16:50 GMT, Per Minborg wrote: >> The nulling out is only safe if the write to the value is visible when a nulled-out function is visible. I think SV can ensure this, but an implementation can easily go wrong trying to do this. (Also `orElseSet` does not NPE if the incoming supplier is null but the value is bound) > > This is something we experimented with a bit in the past. It isn't easy to do in the general case. There are pros (the function and its resources that can be collected) and cons (e.g., mutability, visibility, complexity, etc.) with this. > (Also `orElseSet` does not NPE if the incoming supplier is null but the value is bound) You mean [this](https://github.com/openjdk/jdk/blob/a05717d8da8f804003cfb8d6d25b8a5b6cecd473/src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java#L117-L120) is missing `Objects.requireNonNull(supplier);`, right? >> I think the default Object.toString impl is better here - the type `StableValue` shouldn't really be exposed in a user API endpoint and is just a utility for the users. No need to bikeshed on this mostly useless functionality. > > The `toString()` function for stable value is inspired by `Optional` and some of the collections. `Optional` doesn't have the issue of containing itself. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1991649915 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1989329168 From pminborg at openjdk.org Thu Mar 13 11:20:13 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 11:20:13 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: <4B5XTEOv_30znhklVd9ymC0F4bmdn1bpYR-fYwtpvtY=.f0a2f7d9-0345-4893-a3f1-8559496b3c56@github.com> Message-ID: <9eaXze3gUOvuO_dJhn31evFsyUO6zw9z9Dneexn8WKw=.7bd9d254-d8ea-44a5-a57d-a98173ba536f@github.com> On Wed, 12 Mar 2025 14:38:28 GMT, Luca Kellermann wrote: >> This is something we experimented with a bit in the past. It isn't easy to do in the general case. There are pros (the function and its resources that can be collected) and cons (e.g., mutability, visibility, complexity, etc.) with this. > >> (Also `orElseSet` does not NPE if the incoming supplier is null but the value is bound) > > You mean [this](https://github.com/openjdk/jdk/blob/a05717d8da8f804003cfb8d6d25b8a5b6cecd473/src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java#L117-L120) is missing `Objects.requireNonNull(supplier);`, right? Fixed `Objects.requireNonNull()` now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993279982 From pminborg at openjdk.org Thu Mar 13 11:20:13 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 11:20:13 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: <3WF5febSMDQinBZGiaVydeRNCJn-AzZ3oxR1UcLCuyc=.0a921dfe-b807-491d-9b50-008004b30b72@github.com> References: <3WF5febSMDQinBZGiaVydeRNCJn-AzZ3oxR1UcLCuyc=.0a921dfe-b807-491d-9b50-008004b30b72@github.com> Message-ID: On Tue, 11 Mar 2025 01:20:16 GMT, Johannes Graham wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > src/java.base/share/classes/java/util/ImmutableCollections.java line 772: > >> 770: >> 771: @jdk.internal.ValueBased >> 772: static final class StableList extends AbstractImmutableList { > > Is there significant reuse gained by putting StableList in ImmutableCollection? The back-and-forth between here and SV through SharedSecrets is a little awkward. This allows reuse of `AbstractImmutableList` with list iterators, sub lists and more. > src/java.base/share/classes/java/util/ImmutableCollections.java line 1462: > >> 1460: >> 1461: static final class StableMap >> 1462: extends AbstractImmutableMap { > > Same question about whether StableMap needs to go here. Though there?s more stuff going on for maps than lists here. Same argument as for `StableList`. This allows reuse of several classes including `AbstractImmutableMap`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988605709 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988607732 From duke at openjdk.org Thu Mar 13 11:20:13 2025 From: duke at openjdk.org (Johannes Graham) Date: Thu, 13 Mar 2025 11:20:13 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: <3WF5febSMDQinBZGiaVydeRNCJn-AzZ3oxR1UcLCuyc=.0a921dfe-b807-491d-9b50-008004b30b72@github.com> On Mon, 10 Mar 2025 18:11:23 GMT, Per Minborg wrote: > Implement JEP 502. > > The PR passes tier1-tier3 tests. src/java.base/share/classes/java/util/ImmutableCollections.java line 772: > 770: > 771: @jdk.internal.ValueBased > 772: static final class StableList extends AbstractImmutableList { Is there significant reuse gained by putting StableList in ImmutableCollection? The back-and-forth between here and SV through SharedSecrets is a little awkward. src/java.base/share/classes/java/util/ImmutableCollections.java line 1462: > 1460: > 1461: static final class StableMap > 1462: extends AbstractImmutableMap { Same question about whether StableMap needs to go here. Though there?s more stuff going on for maps than lists here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988220230 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988222063 From duke at openjdk.org Thu Mar 13 11:20:13 2025 From: duke at openjdk.org (Johannes Graham) Date: Thu, 13 Mar 2025 11:20:13 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: <3WF5febSMDQinBZGiaVydeRNCJn-AzZ3oxR1UcLCuyc=.0a921dfe-b807-491d-9b50-008004b30b72@github.com> Message-ID: On Tue, 11 Mar 2025 07:48:40 GMT, Per Minborg wrote: >> src/java.base/share/classes/java/util/ImmutableCollections.java line 772: >> >>> 770: >>> 771: @jdk.internal.ValueBased >>> 772: static final class StableList extends AbstractImmutableList { >> >> Is there significant reuse gained by putting StableList in ImmutableCollection? The back-and-forth between here and SV through SharedSecrets is a little awkward. > > This allows reuse of `AbstractImmutableList` with list iterators, sub lists and more. Using the regular AbstractList as a base would also get you implementations of those. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1989875430 From qamai at openjdk.org Thu Mar 13 11:20:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 13 Mar 2025 11:20:14 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 18:11:23 GMT, Per Minborg wrote: > Implement JEP 502. > > The PR passes tier1-tier3 tests. src/java.base/share/classes/java/util/ImmutableCollections.java line 777: > 775: private final IntFunction mapper; > 776: @Stable > 777: private final StableValueImpl[] backing; You can use a backing `@Stable Object[]` instead. It will reduce indirection when accessing this list. src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 65: > 63: // > 64: @Stable > 65: private volatile Object value; Can we use `acquire`/`release` semantics instead of `volatile`? src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 128: > 126: final T newValue = supplier.get(); > 127: // The mutex is reentrant so we need to check if the value was actually set. > 128: return wrapAndCas(newValue) ? newValue : orElseThrow(); Reentrancy into here seems really buggy, I would endorse disallowing it instead. In that case, a `ReentrantLock` seems better than the native monitor as we can cheaply check `lock.isHeldByCurrentThread()` src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 159: > 157: private boolean wrapAndCas(Object value) { > 158: // This upholds the invariant, a `@Stable` field is written to at most once > 159: return UNSAFE.compareAndSetReference(this, UNDERLYING_DATA_OFFSET, null, wrap(value)); There is no need for a cas here as all setters have to hold the lock. We should have a dedicated private `set` that asserts `Thread.holdsLock(this)`. src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 168: > 166: // Wraps `null` values into a sentinel value > 167: @ForceInline > 168: private static T wrap(T t) { Suggestion: private static Object wrap(T t) { src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 181: > 179: @SuppressWarnings("unchecked") > 180: @ForceInline > 181: private static T nullSentinel() { Suggestion: private static Object nullSentinel() { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988608920 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988612784 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993081551 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988616943 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993110162 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993111723 From liach at openjdk.org Thu Mar 13 11:20:13 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 13 Mar 2025 11:20:13 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: <3WF5febSMDQinBZGiaVydeRNCJn-AzZ3oxR1UcLCuyc=.0a921dfe-b807-491d-9b50-008004b30b72@github.com> Message-ID: On Tue, 11 Mar 2025 18:08:47 GMT, Johannes Graham wrote: >> This allows reuse of `AbstractImmutableList` with list iterators, sub lists and more. > > Using the regular AbstractList as a base would also get you implementations of those. `AbstractList` has non-final fields, which makes it not suitable for `@ValueBased`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1989947965 From pminborg at openjdk.org Thu Mar 13 11:20:14 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 11:20:14 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 07:50:38 GMT, Quan Anh Mai wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > src/java.base/share/classes/java/util/ImmutableCollections.java line 777: > >> 775: private final IntFunction mapper; >> 776: @Stable >> 777: private final StableValueImpl[] backing; > > You can use a backing `@Stable Object[]` instead. It will reduce indirection when accessing this list. Can you please elaborate a bit more on your proposal @merykitty? > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 65: > >> 63: // >> 64: @Stable >> 65: private volatile Object value; > > Can we use `acquire`/`release` semantics instead of `volatile`? Yes we can. However, I am uncertain if the added complexity can motivate any performance benefits. Perhaps on ARM? I can do a benchmark on it. > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 128: > >> 126: final T newValue = supplier.get(); >> 127: // The mutex is reentrant so we need to check if the value was actually set. >> 128: return wrapAndCas(newValue) ? newValue : orElseThrow(); > > Reentrancy into here seems really buggy, I would endorse disallowing it instead. In that case, a `ReentrantLock` seems better than the native monitor as we can cheaply check `lock.isHeldByCurrentThread()` StableValueImpl was carefully designed to minimize memory footprint. Adding a lock would inflate memory usage substantially. > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 159: > >> 157: private boolean wrapAndCas(Object value) { >> 158: // This upholds the invariant, a `@Stable` field is written to at most once >> 159: return UNSAFE.compareAndSetReference(this, UNDERLYING_DATA_OFFSET, null, wrap(value)); > > There is no need for a cas here as all setters have to hold the lock. We should have a dedicated private `set` that asserts `Thread.holdsLock(this)`. This is more of a belt and suspenders solution. It is true that it is redundant. A set volatile would suffice here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1989016337 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988630199 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993142117 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988634451 From qamai at openjdk.org Thu Mar 13 11:20:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 13 Mar 2025 11:20:14 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 11:19:13 GMT, Per Minborg wrote: >> src/java.base/share/classes/java/util/ImmutableCollections.java line 777: >> >>> 775: private final IntFunction mapper; >>> 776: @Stable >>> 777: private final StableValueImpl[] backing; >> >> You can use a backing `@Stable Object[]` instead. It will reduce indirection when accessing this list. > > Can you please elaborate a bit more on your proposal @merykitty? If you have an `@Stable Object[]`, then the elements are also considered `@Stable`. Then you can do something like: ReentrantLock[] locks; T get(int idx) { Object x = backing[idx]; if (x == null) { return compute(idx); } return unwrap(x); } T compute(int idx) { ReentrantLock lock = locks[idx]; lock.lock(); try { Object x = backing[idx]; if (x != null) { return unwrap(x); } T obj = ...; backing[idx] = wrap(obj); return obj; } finally { lock.unlock(); } } >> src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 65: >> >>> 63: // >>> 64: @Stable >>> 65: private volatile Object value; >> >> Can we use `acquire`/`release` semantics instead of `volatile`? > > Yes we can. However, I am uncertain if the added complexity can motivate any performance benefits. Perhaps on ARM? I can do a benchmark on it. You can probably use `acquire` only for the first `get` as it is in the fast path. For other I guess `volatile` is fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1989664004 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1990989622 From pminborg at openjdk.org Thu Mar 13 11:20:14 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 11:20:14 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 16:15:10 GMT, Quan Anh Mai wrote: >> Can you please elaborate a bit more on your proposal @merykitty? > > If you have an `@Stable Object[]`, then the elements are also considered `@Stable`. Then you can do something like: > > ReentrantLock[] locks; > > T get(int idx) { > Object x = backing[idx]; > if (x == null) { > return compute(idx); > } > return unwrap(x); > } > > T compute(int idx) { > ReentrantLock lock = locks[idx]; > lock.lock(); > try { > Object x = backing[idx]; > if (x != null) { > return unwrap(x); > } > T obj = ...; > backing[idx] = wrap(obj); > return obj; > } finally { > lock.unlock(); > } > } What would be the difference between `@Stable StableValueImpl[] backing` and `@Stable Object[] backing`? >> Yes we can. However, I am uncertain if the added complexity can motivate any performance benefits. Perhaps on ARM? I can do a benchmark on it. > > You can probably use `acquire` only for the first `get` as it is in the fast path. For other I guess `volatile` is fine. Yeah. Maybe that could strike a balance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1990908235 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1991599680 From qamai at openjdk.org Thu Mar 13 11:20:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 13 Mar 2025 11:20:14 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: <7hEWpE3GTKhCa15FRQBf6_tzHUsr9mbzv6zh1HeZlXY=.1800d27a-33d9-41de-b8bd-31fd82a3661b@github.com> On Wed, 12 Mar 2025 08:09:04 GMT, Per Minborg wrote: >> If you have an `@Stable Object[]`, then the elements are also considered `@Stable`. Then you can do something like: >> >> ReentrantLock[] locks; >> >> T get(int idx) { >> Object x = backing[idx]; >> if (x == null) { >> return compute(idx); >> } >> return unwrap(x); >> } >> >> T compute(int idx) { >> ReentrantLock lock = locks[idx]; >> lock.lock(); >> try { >> Object x = backing[idx]; >> if (x != null) { >> return unwrap(x); >> } >> T obj = ...; >> backing[idx] = wrap(obj); >> return obj; >> } finally { >> lock.unlock(); >> } >> } > > What would be the difference between `@Stable StableValueImpl[] backing` and `@Stable Object[] backing`? For an `Object[]`, you only need to load the object from the array and it is probably what you need. For a `StableValueImpl[]`, you need to load the `StableValueImpl` from the array, and load the value from that `StableValueImpl`, which is 2 levels of indirections. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1990993459 From mcimadamore at openjdk.org Thu Mar 13 11:20:14 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 13 Mar 2025 11:20:14 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 08:09:04 GMT, Per Minborg wrote: >> If you have an `@Stable Object[]`, then the elements are also considered `@Stable`. Then you can do something like: >> >> ReentrantLock[] locks; >> >> T get(int idx) { >> Object x = backing[idx]; >> if (x == null) { >> return compute(idx); >> } >> return unwrap(x); >> } >> >> T compute(int idx) { >> ReentrantLock lock = locks[idx]; >> lock.lock(); >> try { >> Object x = backing[idx]; >> if (x != null) { >> return unwrap(x); >> } >> T obj = ...; >> backing[idx] = wrap(obj); >> return obj; >> } finally { >> lock.unlock(); >> } >> } > > What would be the difference between `@Stable StableValueImpl[] backing` and `@Stable Object[] backing`? It's true that the storage can be flatter here -- that said, this can also be done as a later refactoring. One advantage of doing things the way @minborg did it here, is that it's fairly easy to prove that the code below is correct -- which makes the initial review easier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1991505367 From pminborg at openjdk.org Thu Mar 13 11:20:14 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 11:20:14 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: <8eLNHYCb3NOjDPJhbTd2sT_MxAhlPjI9fC4mr0sSWOU=.51260d3f-6585-4fbb-a367-43f26bc225fb@github.com> On Wed, 12 Mar 2025 13:28:12 GMT, Maurizio Cimadamore wrote: >> What would be the difference between `@Stable StableValueImpl[] backing` and `@Stable Object[] backing`? > > It's true that the storage can be flatter here -- that said, this can also be done as a later refactoring. One advantage of doing things the way @minborg did it here, is that it's fairly easy to prove that the code below is correct -- which makes the initial review easier. Ahh. Now I see what you mean. This is something we did in a handful of prototypes we explored. While it is true that there will be one indirection less, the complexity of the code is going to grow. Also, if the element is constant folded, it does not matter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1991597981 From pminborg at openjdk.org Thu Mar 13 11:20:14 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 11:20:14 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: <4B5XTEOv_30znhklVd9ymC0F4bmdn1bpYR-fYwtpvtY=.f0a2f7d9-0345-4893-a3f1-8559496b3c56@github.com> Message-ID: On Tue, 11 Mar 2025 00:40:31 GMT, Johannes Graham wrote: >> Indeed, a bit set predicate can be used to check input validity if it is necessary - I think for enums, using a `StableFunction.ofEnum` dedicated API might be better just because `StableValue` can access `Class.getEnumConstantsShared` easily. > > What if instead you had a `@Stable` array of Object of the appropriate size, and populated each cell with a StableValue if the corresponding index was in the set, otherwise used a sentinel value. Then on the lookup, if it was the sentinel you throw, else you use the the SV. > > Also there is an awful lot of similarity between the enum function and the int function. Could one possibly be implemented using the other? Thanks for spotting this glitch. I have fixed the issue and added a test for member sets with "holes". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988707099 From duke at openjdk.org Thu Mar 13 11:20:14 2025 From: duke at openjdk.org (Johannes Graham) Date: Thu, 13 Mar 2025 11:20:14 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: <4B5XTEOv_30znhklVd9ymC0F4bmdn1bpYR-fYwtpvtY=.f0a2f7d9-0345-4893-a3f1-8559496b3c56@github.com> References: <4B5XTEOv_30znhklVd9ymC0F4bmdn1bpYR-fYwtpvtY=.f0a2f7d9-0345-4893-a3f1-8559496b3c56@github.com> Message-ID: On Mon, 10 Mar 2025 23:42:06 GMT, Chen Liang wrote: >> src/java.base/share/classes/jdk/internal/lang/stable/StableEnumFunction.java line 112: >> >>> 110: final Class enumType = (Class)inputs.iterator().next().getClass(); >>> 111: return (Function) new StableEnumFunction(enumType, min, StableValueFactories.array(size), (Function) original); >>> 112: } >> >> If `inputs` contains the enumuration constants with ordinals 0 and 2, wouldn't this code wrongly cause the enumeration constant with ordinal 1 to be an allowed input? > > Indeed, a bit set predicate can be used to check input validity if it is necessary - I think for enums, using a `StableFunction.ofEnum` dedicated API might be better just because `StableValue` can access `Class.getEnumConstantsShared` easily. What if instead you had a `@Stable` array of Object of the appropriate size, and populated each cell with a StableValue if the corresponding index was in the set, otherwise used a sentinel value. Then on the lookup, if it was the sentinel you throw, else you use the the SV. Also there is an awful lot of similarity between the enum function and the int function. Could one possibly be implemented using the other? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1988195668 From pminborg at openjdk.org Thu Mar 13 11:20:14 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 11:20:14 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: <4B5XTEOv_30znhklVd9ymC0F4bmdn1bpYR-fYwtpvtY=.f0a2f7d9-0345-4893-a3f1-8559496b3c56@github.com> Message-ID: On Tue, 11 Mar 2025 08:44:51 GMT, Per Minborg wrote: >> What if instead you had a `@Stable` array of Object of the appropriate size, and populated each cell with a StableValue if the corresponding index was in the set, otherwise used a sentinel value. Then on the lookup, if it was the sentinel you throw, else you use the the SV. >> >> Also there is an awful lot of similarity between the enum function and the int function. Could one possibly be implemented using the other? > > Thanks for spotting this glitch. I have fixed the issue and added a test for member sets with "holes". It might be worth exploring using a stable int function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1991590911 From alanb at openjdk.org Thu Mar 13 11:20:15 2025 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 13 Mar 2025 11:20:15 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 18:11:23 GMT, Per Minborg wrote: > Implement JEP 502. > > The PR passes tier1-tier3 tests. src/jdk.unsupported/share/classes/sun/misc/Unsafe.java line 983: > 981: > 982: @ForceInline > 983: private static void assertNotTrusted(Field f) { I don't think this can be named assertXXX, needs to something like ensureNotTrusted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1987836045 From pminborg at openjdk.org Thu Mar 13 11:20:15 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 11:20:15 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: <1O-QKtK9LBw8YzDDGzJxtBTwYBD3jnoKfXFJYiY1sp4=.dea93da7-bfcd-42ca-98c8-f8e0705bac07@github.com> On Wed, 12 Mar 2025 14:14:59 GMT, Per Minborg wrote: >> You can probably use `acquire` only for the first `get` as it is in the fast path. For other I guess `volatile` is fine. > > Yeah. Maybe that could strike a balance. On an M1 Mac: Volatile: StableValueBenchmark.stable avgt 10 1.373 ? 0.057 ns/op StableValueBenchmark.stableNull avgt 10 1.245 ? 0.074 ns/op Acquire: StableValueBenchmark.stable avgt 10 1.339 ? 0.044 ns/op StableValueBenchmark.stableNull avgt 10 1.241 ? 0.090 ns/op We would have to examine the difference on other platforms as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993284760 From duke at openjdk.org Thu Mar 13 12:03:11 2025 From: duke at openjdk.org (Luca Kellermann) Date: Thu, 13 Mar 2025 12:03:11 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 09:45:22 GMT, Per Minborg wrote: >> src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 128: >> >>> 126: final T newValue = supplier.get(); >>> 127: // The mutex is reentrant so we need to check if the value was actually set. >>> 128: return wrapAndCas(newValue) ? newValue : orElseThrow(); >> >> Reentrancy into here seems really buggy, I would endorse disallowing it instead. In that case, a `ReentrantLock` seems better than the native monitor as we can cheaply check `lock.isHeldByCurrentThread()` > > StableValueImpl was carefully designed to minimize memory footprint. Adding a lock would inflate memory usage substantially. There is also `Thread.holdsLock()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993371561 From jbechberger at openjdk.org Thu Mar 13 12:30:44 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 13 Mar 2025 12:30:44 GMT Subject: RFR: 8342818: Implement CPU Time Profiling for JFR [v40] In-Reply-To: References: Message-ID: > This is the code for the [JEP draft: CPU Time based profiling for JFR]. > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Tiny fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20752/files - new: https://git.openjdk.org/jdk/pull/20752/files/18ec3811..39df1939 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=39 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=38-39 Stats: 7 lines in 1 file changed: 1 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20752.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20752/head:pull/20752 PR: https://git.openjdk.org/jdk/pull/20752 From pminborg at openjdk.org Thu Mar 13 12:45:50 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 12:45:50 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v2] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Rework reenterant logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/09ca44e6..1cd1cdb2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=00-01 Stats: 67 lines in 4 files changed: 56 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From pminborg at openjdk.org Thu Mar 13 12:52:32 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 12:52:32 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v3] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Rename method and fix comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/1cd1cdb2..8f6d6bc0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=01-02 Stats: 6 lines in 1 file changed: 1 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From pminborg at openjdk.org Thu Mar 13 13:01:58 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 13:01:58 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v4] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Rename field ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/8f6d6bc0..c648ea2b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=02-03 Stats: 23 lines in 6 files changed: 1 ins; 0 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From tschatzl at openjdk.org Thu Mar 13 13:07:29 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 13 Mar 2025 13:07:29 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v19] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * when aborting refinement during full collection, the global card table and the per-thread card table might not be in sync. Roll forward during abort of the refinement in these situations. * additional verification * added some missing ResourceMarks in asserts * added variant of ArrayJuggle2 that crashes fairly quickly without these changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/3766b76c..78611173 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=17-18 Stats: 111 lines in 11 files changed: 82 ins; 13 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From galder at openjdk.org Thu Mar 13 13:50:14 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 13 Mar 2025 13:50:14 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <63F-0aHgMthexL0b2DFmkW8_QrJeo8OOlCaIyZApfpY=.4744070d-9d56-4031-8684-be14cf66d1e5@github.com> Message-ID: On Fri, 7 Mar 2025 13:17:29 GMT, Emanuel Peter wrote: >>> As for possible solutions. In all Regression 1-3 cases, it seems the issue is scalar cmove. So actually in all cases a possible solution is using branching code (i.e. `cmp+mov`). So to me, these are the follow-up RFE's: >>> >>> * Detect "extreme" probability scalar cmove, and replace them with branching code. This should take care of all regressions here. This one has high priority, as it fixes the regression caused by this patch here. But it would also help to improve performance for the `Integer.min/max` cases, which have the same issue. >> >> I've created [JDK-8351409](https://bugs.openjdk.org/browse/JDK-8351409) to address this. > > @galderz Excellent. Testing looks all good on our side. Yes I think what you saw was unrelated. > @rwestrel Could give this a last quick scan and then I think you can integrate :) Thanks @eme64 @rwestrel @chhagedorn for your patience with this! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2721319344 From duke at openjdk.org Thu Mar 13 13:50:22 2025 From: duke at openjdk.org (duke) Date: Thu, 13 Mar 2025 13:50:22 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v14] In-Reply-To: <9c34YjVjK0BMclNqFWMSitBV2YTcu_jmgWVitjRgvF0=.0f225af6-5888-4160-9a54-09baa696da1c@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <9c34YjVjK0BMclNqFWMSitBV2YTcu_jmgWVitjRgvF0=.0f225af6-5888-4160-9a54-09baa696da1c@github.com> Message-ID: On Fri, 7 Mar 2025 06:19:03 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 47 additional commits since the last revision: > > - Merge branch 'master' into topic.intrinsify-max-min-long > - Add assertion comments > - Add simple reduction benchmarks on top of multiply ones > - Merge branch 'master' into topic.intrinsify-max-min-long > - Fix typo > - Renaming methods and variables and add docu on algorithms > - Fix copyright years > - Make sure it runs with cpus with either avx512 or asimd > - Test can only run with 256 bit registers or bigger > > * Remove platform dependant check > and use platform independent configuration instead. > - Fix license header > - ... and 37 more: https://git.openjdk.org/jdk/compare/c836c5b7...1aa690d3 @galderz Your change (at version 1aa690d391ef3536d422ba93c33d0fc273a911c6) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2721323015 From pminborg at openjdk.org Thu Mar 13 13:52:24 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 13:52:24 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v5] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Clean up exception messages and fix comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/c648ea2b..2fe5b0f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=03-04 Stats: 10 lines in 2 files changed: 3 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From galder at openjdk.org Thu Mar 13 13:57:23 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 13 Mar 2025 13:57:23 GMT Subject: Integrated: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Tue, 9 Jul 2024 12:07:37 GMT, Galder Zamarre?o wrote: > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... This pull request has now been integrated. Changeset: 4e51a8c9 Author: Galder Zamarre?o URL: https://git.openjdk.org/jdk/commit/4e51a8c9ad4e5345d05cf32ce1e82b7158f80e93 Stats: 844 lines in 9 files changed: 725 ins; 107 del; 12 mod 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) Reviewed-by: roland, epeter, chagedorn, darcy ------------- PR: https://git.openjdk.org/jdk/pull/20098 From tschatzl at openjdk.org Thu Mar 13 14:16:07 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 13 Mar 2025 14:16:07 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v19] In-Reply-To: References: Message-ID: <-ys7CbBNU4hCmEgYQyZpmBQ_rso4i2_KoFHLPNv73sI=.bd715b1d-b9fd-48b7-bb06-d6673ab2dbfc@github.com> On Thu, 13 Mar 2025 13:07:29 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * when aborting refinement during full collection, the global card table and the per-thread card table might not be in sync. Roll forward during abort of the refinement in these situations. > * additional verification > * added some missing ResourceMarks in asserts > * added variant of ArrayJuggle2 that crashes fairly quickly without these changes Commit https://github.com/openjdk/jdk/pull/23739/commits/786111735c306583af5bc75f7653f0da67d52adb fixes an issue with full gc interrupting refinement while the global card table and the JavaThread's card table changes. Testing: tier1-7 with changes, tier1-5 with changes stressing refinement similar to the ones added to the new test. The new variant of `ArrayJuggle2` fails >50% of all times in our CI without the patch (verified 700 or so executions of that not failing with patch). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2721413659 From pminborg at openjdk.org Thu Mar 13 15:22:43 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Mar 2025 15:22:43 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: - Merge branch 'master' into implement-jep502 - Clean up exception messages and fix comments - Rename field - Rename method and fix comment - Rework reenterant logic - Use acquire semantics for reading rather than volatile semantics - Add missing null check - Simplify handling of sentinel, wrap, and unwrap - Fix JavaDoc issues - Fix members in StableEnumFunction - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f ------------- Changes: https://git.openjdk.org/jdk/pull/23972/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=05 Stats: 4040 lines in 30 files changed: 4009 ins; 18 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From mcimadamore at openjdk.org Thu Mar 13 15:39:15 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 13 Mar 2025 15:39:15 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f src/hotspot/share/ci/ciField.cpp line 254: > 252: > 253: static bool trust_final_non_static_fields_of_type(Symbol* signature) { > 254: return signature == vmSymbols::java_lang_StableValue_signature(); Just a note that we will need to decide whether to keep this or not... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993795795 From mcimadamore at openjdk.org Thu Mar 13 15:52:16 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 13 Mar 2025 15:52:16 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f src/java.base/share/classes/java/lang/StableValue.java line 45: > 43: > 44: /** > 45: * A stable value is a shallowly immutable holder of deferred content. Is this terminology a leftover from previous JEP iterations? The JEP now says: > stable values, which are objects that hold immutable data. src/java.base/share/classes/java/lang/StableValue.java line 283: > 281: * the {@code Foo} does not already exist. > 282: *

> 283: * Here is another example where a more complex dependency graph is created in which I wonder if just leaving the fibonacci example would be enough here -- as that has a nice dependency graph src/java.base/share/classes/java/lang/StableValue.java line 330: > 328: * thread safe and guarantee at-most-once-per-input invocation. > 329: * > 330: *

Miscellaneous

I'm dubious about a section called "misc" :-) src/java.base/share/classes/java/lang/StableValue.java line 331: > 329: * > 330: *

Miscellaneous

> 331: * Except for a StableValue's content itself, an {@linkplain #orElse(Object) orElse(other)} missing `{@code}` src/java.base/share/classes/java/lang/StableValue.java line 335: > 333: * parameters must be non-null or a {@link NullPointerException} will be thrown. > 334: *

> 335: * Stable functions and collections are not {@link Serializable} as this would require Not sure this belongs here. Perhaps the comment on these functions not being serializable should be on their factories. And the point on security vulnerability seems specific and vague at the same time -- better remove it. src/java.base/share/classes/java/lang/StableValue.java line 339: > 337: * which would introduce security vulnerabilities. > 338: *

> 339: * As objects can be set via stable values but never removed, this can be a source It feels like this could probably be expanded upon -- also covering stable functions (and morphed into a new section) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993819611 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993803509 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993810862 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993805209 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993808888 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993812118 From mcimadamore at openjdk.org Thu Mar 13 15:52:16 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 13 Mar 2025 15:52:16 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:48:25 GMT, Maurizio Cimadamore wrote: >> Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: >> >> - Merge branch 'master' into implement-jep502 >> - Clean up exception messages and fix comments >> - Rename field >> - Rename method and fix comment >> - Rework reenterant logic >> - Use acquire semantics for reading rather than volatile semantics >> - Add missing null check >> - Simplify handling of sentinel, wrap, and unwrap >> - Fix JavaDoc issues >> - Fix members in StableEnumFunction >> - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f > > src/java.base/share/classes/java/lang/StableValue.java line 45: > >> 43: >> 44: /** >> 45: * A stable value is a shallowly immutable holder of deferred content. > > Is this terminology a leftover from previous JEP iterations? The JEP now says: >> stable values, which are objects that hold immutable data. Maybe: `A stable value in an holder for shallowly immutable content`. > src/java.base/share/classes/java/lang/StableValue.java line 330: > >> 328: * thread safe and guarantee at-most-once-per-input invocation. >> 329: * >> 330: *

Miscellaneous

> > I'm dubious about a section called "misc" :-) We can probably move out some of the contents (I left some suggestions) - then move the remaining into api notes? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993822111 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993815323 From rehn at openjdk.org Thu Mar 13 15:56:16 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 13 Mar 2025 15:56:16 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars Message-ID: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Hi please consider. |RVWMO| Patched| | ---------- | ---------- | |fence iorw,iorw| fence iorw,ow| |sw t4,120(t2) | sw t4,120(t2) | |fence ow,ir | unnecessary_membar_volatile_rvwmo | | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | |fence iorw,ow | fence iorw,ow| |sw t5,124(t2) |sw t5,124(t2) | |TSO | Patched| | ---------- | ---------- | | lw a4,120(t2) | lw a6,120(t2) | | sw a0,124(t2) | sw t6,124(t2) | | fence iorw,iorw | unnecessary_membar_volatile_tso | | sw t4,120(t2) | sw t4,120(t2) | | fence ow,ir | unnecessary_membar_volatile_tso | | sw t6,128(t2) | sw t5,128(t2) | | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | | fence iorw,iorw | unnecessary_membar_volatile_tso | |... | ... | | sw a3,120(t2) | sw a0,120(t2) | | fence ow,ir | fence ow,ir | | lw a7,124(t2) | lw a5,124(t2) | For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. The patch do: - Separate ztso and rvwmo in ad by using UseZtso predicate. - Match all that requires the same membar. - Make fence/fencei protected as they shouldn't be using directly. - Increased cost of membars to VOLATILE_REF_COST. - Added a real_empty pipe. - Change to pipe_slow on TSO (as x86). Note that C2-rv64 is now superior to gcc/clang regrading fencing: https://godbolt.org/z/6E3YTP15j Testing jcstress, tier1 and manually reading the generated assembly. Doing additional testing, but RFR it now as it may need some consideration. /Robbin ------------- Commit messages: - Fixed ws - Revert NC - Fixed comment - UseNewCode Changes: https://git.openjdk.org/jdk/pull/24035/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351949 Stats: 148 lines in 4 files changed: 72 ins; 27 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/24035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24035/head:pull/24035 PR: https://git.openjdk.org/jdk/pull/24035 From mcimadamore at openjdk.org Thu Mar 13 15:57:10 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 13 Mar 2025 15:57:10 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f src/java.base/share/classes/java/lang/StableValue.java line 497: > 495: > 496: /** > 497: * {@return a new unset stable supplier} Should we say "unset" here? src/java.base/share/classes/java/lang/StableValue.java line 526: > 524: > 525: /** > 526: * {@return a new unset stable int function} Should we say "unset" here? src/java.base/share/classes/java/lang/StableValue.java line 564: > 562: > 563: /** > 564: * {@return a new unset stable function} Should we say "unset" here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993828637 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993829241 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993831680 From mcimadamore at openjdk.org Thu Mar 13 15:57:10 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 13 Mar 2025 15:57:10 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:52:37 GMT, Maurizio Cimadamore wrote: >> Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: >> >> - Merge branch 'master' into implement-jep502 >> - Clean up exception messages and fix comments >> - Rename field >> - Rename method and fix comment >> - Rework reenterant logic >> - Use acquire semantics for reading rather than volatile semantics >> - Add missing null check >> - Simplify handling of sentinel, wrap, and unwrap >> - Fix JavaDoc issues >> - Fix members in StableEnumFunction >> - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f > > src/java.base/share/classes/java/lang/StableValue.java line 497: > >> 495: >> 496: /** >> 497: * {@return a new unset stable supplier} > > Should we say "unset" here? E.g. we do not define the term "unset supplier" anywhere -- we just define what a stable supplier is -- IMHO that's enough. Also... whether unset or set, that's not really visible by the user? > src/java.base/share/classes/java/lang/StableValue.java line 564: > >> 562: >> 563: /** >> 564: * {@return a new unset stable function} > > Should we say "unset" here? Same with all the other lazy XYZ factories ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993830847 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993832264 From mcimadamore at openjdk.org Thu Mar 13 16:07:14 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 13 Mar 2025 16:07:14 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 74: > 72: @Override > 73: public boolean trySet(T value) { > 74: if (wrappedContentAcquire() != null) { IMHO, if our goal is to do: Object content = this.content; if (context != null) return content: synchronized (...) { if (context != null) return content: this.context = ... } Then we might just use a volatile field and synchronized blocks. I don't see an immediate need for using acquire/release semantics -- especially when using a monitor. E.g. this should look more like a classic double checked locking idiom. (but with a stable field to make the first volatile read more efficient in case the field is already set) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993850760 From iklam at openjdk.org Fri Mar 14 01:53:38 2025 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 14 Mar 2025 01:53:38 GMT Subject: RFR: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 [v2] In-Reply-To: References: Message-ID: > Since [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426) (Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file), the AOT cache no longer contains classes intended for custom class loaders (these are called "unregistered classes" in CDS terminology). > > The fix is simple -- we already remember the set of unregistered classes in the AOT configuration file. We just need to add them into the final AOT cache (see changes in finalImageRecipes.cpp). Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 - 8351319: AOT cache support for custom class loaders broken since JDK-8348426 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23926/files - new: https://git.openjdk.org/jdk/pull/23926/files/2cdb5f80..41ae5507 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23926&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23926&range=00-01 Stats: 48120 lines in 865 files changed: 23786 ins; 15583 del; 8751 mod Patch: https://git.openjdk.org/jdk/pull/23926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23926/head:pull/23926 PR: https://git.openjdk.org/jdk/pull/23926 From iklam at openjdk.org Fri Mar 14 05:33:04 2025 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 14 Mar 2025 05:33:04 GMT Subject: RFR: 8352001: AOT cache should not contain classes injected into built-in class loaders Message-ID: During an application's training run, it's possible to inject classes into the built-in platform/app class loaders with reflection calls. - Before [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426), only the names of these classes were recorded in the AOT config file. When the AOT cache is generated, these classes are automatically filtered out. - Since [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426), these classes are stored as parsed InstanceKlasses in the AOT config file, and will be transferred into the AOT cache. This new behavior may cause some applications to fail, as they may inject bytecodes that have environment dependencies. For safety, this PR filters out such injected classes from the AOT config file. ------------- Commit messages: - 8352001: AOT cache should not contain classes injected into built-in class loaders Changes: https://git.openjdk.org/jdk/pull/24046/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24046&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352001 Stats: 238 lines in 10 files changed: 233 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24046.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24046/head:pull/24046 PR: https://git.openjdk.org/jdk/pull/24046 From iklam at openjdk.org Fri Mar 14 06:46:57 2025 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 14 Mar 2025 06:46:57 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v2] In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: On Wed, 12 Mar 2025 07:02:22 GMT, Thomas Stuefe wrote: >> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. >> >> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). >> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). >> >> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . >> >> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. >> >> Tests: >> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths >> - SAP reports all tests green (they had reported errors with the previous version) >> - Oracle Tests ongoing >> - GHAs green > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > skip test if we have no COH archive LGTM ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23912#pullrequestreview-2684475218 From fyang at openjdk.org Fri Mar 14 07:23:06 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Mar 2025 07:23:06 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars In-Reply-To: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Thu, 13 Mar 2025 13:49:32 GMT, Robbin Ehn wrote: > Hi please consider. > > |RVWMO| Patched| > | ---------- | ---------- | > |fence iorw,iorw| fence iorw,ow| > |sw t4,120(t2) | sw t4,120(t2) | > |fence ow,ir | unnecessary_membar_volatile_rvwmo | > | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | > |fence iorw,ow | fence iorw,ow| > |sw t5,124(t2) |sw t5,124(t2) | > > |TSO | Patched| > | ---------- | ---------- | > | lw a4,120(t2) | lw a6,120(t2) | > | sw a0,124(t2) | sw t6,124(t2) | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > | sw t4,120(t2) | sw t4,120(t2) | > | fence ow,ir | unnecessary_membar_volatile_tso | > | sw t6,128(t2) | sw t5,128(t2) | > | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > |... | ... | > | sw a3,120(t2) | sw a0,120(t2) | > | fence ow,ir | fence ow,ir | > | lw a7,124(t2) | lw a5,124(t2) | > > For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. > > The patch do: > - Separate ztso and rvwmo in ad by using UseZtso predicate. > - Match all that requires the same membar. > - Make fence/fencei protected as they shouldn't be using directly. > - Increased cost of membars to VOLATILE_REF_COST. > - Added a real_empty pipe. > - Change to pipe_slow on TSO (as x86). > > Note that C2-rv64 is now superior to gcc/clang regrading fencing: > https://godbolt.org/z/6E3YTP15j > > Testing jcstress, tier1 and manually reading the generated assembly. > Doing additional testing, but RFR it now as it may need some consideration. > > /Robbin Hi, I am trying to understand this. What's the Java source code look like regarding the first table showing the difference in JIT code in PR description? In naming, can we use names like `_rvtso` instead of `_tso` which I think maps better to `_rvwmo`? I think it's OK as I read this from the RV spec which also mentions RVTSO: This chapter defines the "Ztso" extension for the RISC-V Total Store Ordering (RVTSO) memory consistency model. RVTSO is defined as a delta from RVWMO, which is defined in Section 17.1. src/hotspot/cpu/riscv/riscv.ad line 7982: > 7980: > 7981: format %{ "membar_rvwmo_storestore\n\t" > 7982: "fence rw, w" %} Shouldn't this be `"fence w, w"`? ------------- PR Review: https://git.openjdk.org/jdk/pull/24035#pullrequestreview-2684509713 PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r1994972391 From rehn at openjdk.org Fri Mar 14 07:51:54 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 14 Mar 2025 07:51:54 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Fri, 14 Mar 2025 07:19:55 GMT, Fei Yang wrote: > Hi, I am trying to understand this. What's the Java source code look like regarding the first table showing the difference in JIT code in PR description? > Thank you for having a look. Do this help ? volatile int v_a; volatile int v_b; int plain_c; int temp_a = v_a; v_b = 77; plain_c = 66; // Non-volatile store between two volatile stores. v_a = 88; int temp_b = v_b; The peephole is very powerful on tso, as we only care about store-load. But it do improve some cases on rvwmo also, hence I enabled it on both. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24035#issuecomment-2723892647 From rehn at openjdk.org Fri Mar 14 07:57:55 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 14 Mar 2025 07:57:55 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: <7cbV3zovBxsEUAa1v1E7WJObk8I0jH7mGuCkNRNOvis=.2196a36a-c26f-4dd0-ae0f-c1a2d629740c@github.com> On Fri, 14 Mar 2025 07:19:55 GMT, Fei Yang wrote: > In naming, can we use names like `_rvtso` instead of `_tso` which I think maps better to `_rvwmo`? I think it's OK as I read this from the RV spec which also mentions RVTSO: Yes, sure. Also please run some jcstress on any machine you have! > src/hotspot/cpu/riscv/riscv.ad line 7982: > >> 7980: >> 7981: format %{ "membar_rvwmo_storestore\n\t" >> 7982: "fence rw, w" %} > > Shouldn't this be `"fence w, w"`? Yes, thanks. Copy paste issue :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24035#issuecomment-2723904466 PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r1995040006 From stuefe at openjdk.org Fri Mar 14 09:20:40 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 14 Mar 2025 09:20:40 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v3] In-Reply-To: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: > Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. > > JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). > Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). > > The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . > > The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. > > Tests: > - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths > - SAP reports all tests green (they had reported errors with the previous version) > - Oracle Tests ongoing > - GHAs green Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use - skip test if we have no COH archive - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use - aix fix - test and aix exclusion - Fix windows when ArchiveRelocationMode=0 or 2 - original ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23912/files - new: https://git.openjdk.org/jdk/pull/23912/files/78894849..f7dd4f5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23912&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23912&range=01-02 Stats: 45386 lines in 700 files changed: 22680 ins; 14650 del; 8056 mod Patch: https://git.openjdk.org/jdk/pull/23912.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23912/head:pull/23912 PR: https://git.openjdk.org/jdk/pull/23912 From mdoerr at openjdk.org Fri Mar 14 10:19:40 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 14 Mar 2025 10:19:40 GMT Subject: RFR: 8351666: [PPC64] Make non-volatile VectorRegisters available for C2 register allocation Message-ID: This PR makes the non-volatile VectorRegisters available for C2's register allocation. I had to implement the VectorRegisters properly (4 VM Regs) like on other platforms. The old version has run into assertions and looked strange. The non-volatile VectorRegisters are now saved when entering Java: call_stub and upcall_stubs. I have rewritten the save and restore functions and used them for both. Then, I have removed code which has become dead. I only save and restore them if C2 uses the vector instructions (controlled by `SuperwordUseVSX`). I have moved the non-volatile spill area out of the entry_frame_locals because it has a variable size, now. The stack area for all non-volatile registers has become larger than the 288 Bytes which are allowed to be used below the SP (specified by the ABI). Therefore, I had to rewrite the call_stub sequence significantly. We need to push the new frame before saving the registers, now. Saving and restoring the FP registers is not needed in the slow signature handler which also uses the save and restore code for non-volatile registers. ------------- Commit messages: - C2: Specify VSR52-63 as SOE and revert commit 2. - Fix register classification. - Update Copyright headers. - Add missing alignment in upcall stub frames. - Avoid redundant nv VR spill code in CRC stubs. - 8351666: [PPC64] Make non-volatile VectorRegisters available for C2 register allocation Changes: https://git.openjdk.org/jdk/pull/23987/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23987&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351666 Stats: 829 lines in 13 files changed: 328 ins; 245 del; 256 mod Patch: https://git.openjdk.org/jdk/pull/23987.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23987/head:pull/23987 PR: https://git.openjdk.org/jdk/pull/23987 From rvansa at azul.com Fri Mar 14 10:21:04 2025 From: rvansa at azul.com (Radim Vansa) Date: Fri, 14 Mar 2025 11:21:04 +0100 Subject: Perf regression accessing fields in JDK21 Message-ID: <413c2833-f15b-4676-a30c-ab9b6b5a52eb@azul.com> Hello, I've stumbled upon a performance regression when accessing fields in a class with many fields, in interpreted mode. I believe this is caused by introduction of FieldStream [1]; another performance regression caused by this change was already found upon heap dump [2]. Field access causes iteration through all fields in InstanceKlass::find_local_field(...) [3]. After [1] this results in decoding many variable-length integers (through FieldInfoReader.read_field_info invoked by the next() method) rather than simple indexed access, which turns out to be costly. Moreover, when we call fd->reinitialize(...) the indexed access [4] results in another iteration through the fields [5] rather than O(1) access. I am attaching a reproducer; this creates a class with 21,000 fields, compiles it and executes this (all that the class does is to initialize all its fields). On JDK 17 running this reports 581 ms; on JDK 21 the test takes 8017 ms on my laptop. I was able to avoid the second iteration by passing FieldInfo to reinitialize() and the execution went down to 5712 ms, but I don't see a simple solution that would make the first iteration more efficient. I was thinking that the name and signature indices might not be variable-size encoded (this would require 4 rather than 2 bytes in classes with a short constant pool) but then in reinitialize() we would need to load the other flags anyway (hence iterating through the stream with variable lengths). The stream could be also reordered (?) into several sequences of fixed length records, but this is rather complicated and does not really guarantee to fix the regression. Thank you for your thoughts on this matter. Radim Vansa [1] https://bugs.openjdk.org/browse/JDK-8292818 [2] https://bugs.openjdk.org/browse/JDK-8317692 [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/oops/instanceKlass.cpp#L1783 [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/fieldDescriptor.cpp#L97 [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/oops/instanceKlass.cpp#L1774 -------------- next part -------------- A non-text attachment was scrubbed... Name: ManyFieldsRegression.java Type: text/x-java Size: 2198 bytes Desc: not available URL: From rehn at openjdk.org Fri Mar 14 11:38:19 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 14 Mar 2025 11:38:19 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v2] In-Reply-To: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: > Hi please consider. > > |RVWMO| Patched| > | ---------- | ---------- | > |fence iorw,iorw| fence iorw,ow| > |sw t4,120(t2) | sw t4,120(t2) | > |fence ow,ir | unnecessary_membar_volatile_rvwmo | > | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | > |fence iorw,ow | fence iorw,ow| > |sw t5,124(t2) |sw t5,124(t2) | > > |TSO | Patched| > | ---------- | ---------- | > | lw a4,120(t2) | lw a6,120(t2) | > | sw a0,124(t2) | sw t6,124(t2) | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > | sw t4,120(t2) | sw t4,120(t2) | > | fence ow,ir | unnecessary_membar_volatile_tso | > | sw t6,128(t2) | sw t5,128(t2) | > | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > |... | ... | > | sw a3,120(t2) | sw a0,120(t2) | > | fence ow,ir | fence ow,ir | > | lw a7,124(t2) | lw a5,124(t2) | > > For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. > > The patch do: > - Separate ztso and rvwmo in ad by using UseZtso predicate. > - Match all that requires the same membar. > - Make fence/fencei protected as they shouldn't be using directly. > - Increased cost of membars to VOLATILE_REF_COST. > - Added a real_empty pipe. > - Change to pipe_slow on TSO (as x86). > > Note that C2-rv64 is now superior to gcc/clang regrading fencing: > https://godbolt.org/z/6E3YTP15j > > Testing jcstress, tier1 and manually reading the generated assembly. > Doing additional testing, but RFR it now as it may need some consideration. > > /Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24035/files - new: https://git.openjdk.org/jdk/pull/24035/files/0fdda550..4279f9fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=00-01 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24035/head:pull/24035 PR: https://git.openjdk.org/jdk/pull/24035 From cnorrbin at openjdk.org Fri Mar 14 11:47:45 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Fri, 14 Mar 2025 11:47:45 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v14] In-Reply-To: References: Message-ID: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - replace_at_cursor fix + non-const prev/next - Merge branch 'master' into rb-tree-intrusive-v2 - insert node intrusive fix - separate intrusivenode and normal node classes - build fix - separate rbnode and normal tree subclass - Merge branch 'master' into rb-tree-intrusive-v2 - renamed non-value upsert to insert - johan feedback - empty base optimization reference - ... and 7 more: https://git.openjdk.org/jdk/compare/cd9f1d3d...0a92e60b ------------- Changes: https://git.openjdk.org/jdk/pull/23416/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=13 Stats: 1033 lines in 3 files changed: 683 ins; 128 del; 222 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From luhenry at openjdk.org Fri Mar 14 14:01:56 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 14 Mar 2025 14:01:56 GMT Subject: RFR: 8318220: RISC-V: C2 ReverseI [v4] In-Reply-To: <8GCHwRel0a2PCahmHnwpQCAYiLiBj6dseJgyyUSn1jY=.bf93b8d1-fbe9-409b-b2e8-47a07bf9d672@github.com> References: <8GCHwRel0a2PCahmHnwpQCAYiLiBj6dseJgyyUSn1jY=.bf93b8d1-fbe9-409b-b2e8-47a07bf9d672@github.com> Message-ID: On Thu, 13 Mar 2025 08:31:30 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch to add ReverseI and ReverseIL intrinsic on riscv? >> >> Thanks! > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into reverse-I-L > - not enable Zbkb automatically > - refine tests > - use srai instead of srli > - clean test > - clean test > - clean test > - clean > - add tests > - clean > - ... and 5 more: https://git.openjdk.org/jdk/compare/a33b1f7f...61bef248 Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23963#pullrequestreview-2685658117 From jbechberger at openjdk.org Fri Mar 14 14:03:37 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Fri, 14 Mar 2025 14:03:37 GMT Subject: RFR: 8342818: Implement CPU Time Profiling for JFR [v41] In-Reply-To: References: Message-ID: > This is the code for the [JEP draft: CPU Time based profiling for JFR]. > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: WIP: dequeue fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20752/files - new: https://git.openjdk.org/jdk/pull/20752/files/39df1939..c65d8704 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=40 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=39-40 Stats: 15 lines in 1 file changed: 11 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20752.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20752/head:pull/20752 PR: https://git.openjdk.org/jdk/pull/20752 From mli at openjdk.org Fri Mar 14 14:15:56 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 14 Mar 2025 14:15:56 GMT Subject: RFR: 8318220: RISC-V: C2 ReverseI [v4] In-Reply-To: References: <8GCHwRel0a2PCahmHnwpQCAYiLiBj6dseJgyyUSn1jY=.bf93b8d1-fbe9-409b-b2e8-47a07bf9d672@github.com> Message-ID: On Fri, 14 Mar 2025 13:59:09 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge branch 'master' into reverse-I-L >> - not enable Zbkb automatically >> - refine tests >> - use srai instead of srli >> - clean test >> - clean test >> - clean test >> - clean >> - add tests >> - clean >> - ... and 5 more: https://git.openjdk.org/jdk/compare/a33b1f7f...61bef248 > > Marked as reviewed by luhenry (Committer). Thank you @luhenry ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23963#issuecomment-2724825413 From tschatzl at openjdk.org Fri Mar 14 14:28:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 14 Mar 2025 14:28:57 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17] In-Reply-To: References: <0w7seS1tIFhUxnmStxQySISWVfpBBsRmUtx7EoTy9a4=.509a3d5e-56d0-4fd8-8896-51835b14302b@github.com> Message-ID: <58jXaIS3TNN9Y9xWGSKWM7B4C0dbZ6YxRWjPMmBeFnY=.506b75a0-12a4-424c-869c-8358195947d9@github.com> On Wed, 12 Mar 2025 13:56:57 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 263: >> >>> 261: >>> 262: SuspendibleThreadSetLeaver sts_leave; >>> 263: VMThread::execute(&op); >> >> Can you elaborate what synchronization this VM op is trying to achieve? > > Memory visibility for refinement threads for the references written to the heap. Without them, they may not have received the most recent values. > This is the same as the `StoreLoad` barriers synchronization between mutator and refinement threads imo. There has been a discussion about whether this is actually needed. Initially we thought that this could be removed because it's only the refinement worker threads that would need memory synchronization, and the memory synchronization is handled by just starting up the refinement threads. However the rebuild remsets process (marking threads) also access the global card table reference to mark the to-collection-set cards and its value must be synchronized. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1995683088 From tschatzl at openjdk.org Fri Mar 14 14:37:27 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 14 Mar 2025 14:37:27 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v20] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * ayang review * re-add STS leaver for java thread handshake ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/78611173..51a9eed8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=18-19 Stats: 15 lines in 1 file changed: 5 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From shipilev at amazon.de Fri Mar 14 14:50:49 2025 From: shipilev at amazon.de (Aleksey Shipilev) Date: Fri, 14 Mar 2025 15:50:49 +0100 Subject: Perf regression accessing fields in JDK21 In-Reply-To: <413c2833-f15b-4676-a30c-ab9b6b5a52eb@azul.com> References: <413c2833-f15b-4676-a30c-ab9b6b5a52eb@azul.com> Message-ID: <54756070-c082-4ac0-87ca-16dddf06dcd1@amazon.de> Hi Radim, On 14.03.25 11:21, Radim Vansa wrote: > I am attaching a reproducer; this creates a class with 21,000 fields, > compiles it and executes this (all that the class does is to initialize > all its fields). On JDK 17 running this reports 581 ms; on JDK 21 the > test takes 8017 ms on my laptop. Right. Please submit the RFE with your reproducer. > I was able to avoid the second iteration by passing FieldInfo to > reinitialize() and the execution went down to 5712 ms, but I don't see a > simple solution that would make the first iteration more efficient. I recall looking at this when optimizing heap dumps, and I think we would not be able to avoid O(n) on initial field resolution. But I would expect incremental improvements to this code are possible. Thanks, -Aleksey From mli at openjdk.org Fri Mar 14 15:10:09 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 14 Mar 2025 15:10:09 GMT Subject: Integrated: 8318220: RISC-V: C2 ReverseI In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 14:26:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch to add ReverseI and ReverseIL intrinsic on riscv? > > Thanks! This pull request has now been integrated. Changeset: 712a70c5 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/712a70c5c44ac1fe916ceb1fff854d689b79b126 Stats: 323 lines in 9 files changed: 323 ins; 0 del; 0 mod 8318220: RISC-V: C2 ReverseI 8318221: RISC-V: C2 ReverseL Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/23963 From rvansa at azul.com Fri Mar 14 15:27:49 2025 From: rvansa at azul.com (Radim Vansa) Date: Fri, 14 Mar 2025 16:27:49 +0100 Subject: Perf regression accessing fields in JDK21 In-Reply-To: <54756070-c082-4ac0-87ca-16dddf06dcd1@amazon.de> References: <413c2833-f15b-4676-a30c-ab9b6b5a52eb@azul.com> <54756070-c082-4ac0-87ca-16dddf06dcd1@amazon.de> Message-ID: <246a4ff2-8c03-45ab-89b6-a59136bba100@azul.com> Thank you for the reply, I've created https://bugs.openjdk.org/browse/JDK-8352075 > I recall looking at this when optimizing heap dumps, and I think we > would not be able to avoid O(n) > on initial field resolution. But I would expect incremental > improvements to this code are possible. You're right, even prior to the regression the code was O(n), but apparently each iteration was significantly cheaper. We could also think about different strategies for regular 'small' classes where the overhead is bearable, vs. big classes where this builds up and it might be worth sacrificing the memory for a more efficient lookup. If we can't have both at once. Cheers, Radim From ayang at openjdk.org Fri Mar 14 15:27:58 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 14 Mar 2025 15:27:58 GMT Subject: RFR: 8346194: Improve G1 pre-barrier C2 cost estimate [v2] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 10:35:07 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that modifies pre-barrier node costs for loop-unrolling to only consider the fast path. The reasoning is similar to zgc (and the new costs as well): only the part of the barrier inlined into the main code stream, as the slow path is laid out separately and does/should not directly affect performance (particularly if there is no marking going on). >> >> There are no differences/impact in performance since the post barrier cost is still very large, which fill be fixed elsewhere. >> >> Testing: gha, perf testing standalone (neither micros nor actual benchmarks give any difference outside of variance), testing with JDK-8342382 >> >> Hth, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp > > Co-authored-by: Roberto Casta?eda Lozano Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23862#pullrequestreview-2685917268 From tschatzl at openjdk.org Fri Mar 14 16:35:38 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 14 Mar 2025 16:35:38 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v21] In-Reply-To: References: Message-ID: <1bH6bLmIYx6eVtZ4IPlFtdYpdCAwSaNB6w0uNljTSJE=.8a4a88c7-2f66-493c-91dd-6fc6c744c08f@github.com> > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: - Merge branch 'master' into 8342381-card-table-instead-of-dcq - * ayang review * re-add STS leaver for java thread handshake - * when aborting refinement during full collection, the global card table and the per-thread card table might not be in sync. Roll forward during abort of the refinement in these situations. * additional verification * added some missing ResourceMarks in asserts * added variant of ArrayJuggle2 that crashes fairly quickly without these changes - * ayang review * remove unnecessary STSleaver * some more documentation around to_collection_card card color - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. Cause are last-minute changes before making the PR ready to review. Testing: without the patch, occurs fairly frequently when continuously (1 in 20) starting refinement. Does not afterward. - * ayang review 3 * comments * minor refactorings - * iwalulya review * renaming * fix some includes, forward declaration - * fix whitespace * additional whitespace between log tags * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename - ... and 18 more: https://git.openjdk.org/jdk/compare/7f428041...b0730176 ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=20 Stats: 6761 lines in 99 files changed: 2368 ins; 3464 del; 929 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From shade at openjdk.org Fri Mar 14 18:35:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Mar 2025 18:35:59 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 19:39:45 GMT, Markus Gr?nlund wrote: > Generally, any event that cannot be enabled by default needs good motivations. It is not like we cannot enable it by default. We just don't know yet what is the actual overhead of doing so. My initial thought was to not enable it by default to give us extra safety. > Unsure if this event type carries enough weight. The JavaMonitorEvent already has a notified field This event complements `Wait` event, giving the us novel capabilities. Capturing notifier thread in wait event gives us some info, but lacks the context for notification itself. For example, the stack trace for notification. Also -- and it could have been handy a few months back for actual performance work! -- new event allows us to estimate notify->wait latencies almost directly: $ build/linux-x86_64-server-release/images/jdk/bin/java \ -XX:+UnlockExperimentalVMOptions -XX:-UseFastUnorderedTimeStamps \ -XX:StartFlightRecording=filename=latency.jfr,jdk.JavaMonitorNotify#enabled=true \ NotifyWaitLatency.java Sample event pair: "type": "jdk.JavaMonitorNotify", "values": { "startTime": "2025-03-14T19:01:50.186277259+01:00", "duration": "PT0.000000491S", "notifiedCount": 1, "address": 135954102752832 "type": "jdk.JavaMonitorWait", "values": { "startTime": "2025-03-14T19:01:50.086326997+01:00", "duration": "PT0.099964319S", "address": 135954102752832 The latency estimate is: .086326997s + .099964319s - .186277259s = .000014057s = 14.1us ------------- PR Comment: https://git.openjdk.org/jdk/pull/23901#issuecomment-2725440464 From jbechberger at openjdk.org Fri Mar 14 18:36:31 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Fri, 14 Mar 2025 18:36:31 GMT Subject: RFR: 8342818: Implement CPU Time Profiling for JFR [v42] In-Reply-To: References: Message-ID: > This is the code for the [JEP draft: CPU Time based profiling for JFR]. > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Mark queue element as empty to prevent stalling - Remove debug output ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20752/files - new: https://git.openjdk.org/jdk/pull/20752/files/c65d8704..f1bb87f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=41 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=40-41 Stats: 13 lines in 1 file changed: 6 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20752.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20752/head:pull/20752 PR: https://git.openjdk.org/jdk/pull/20752 From tschatzl at openjdk.org Sat Mar 15 13:12:39 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Sat, 15 Mar 2025 13:12:39 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v22] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * more documentation on why we need to rendezvous the gc threads ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/b0730176..447fe39b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=20-21 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From liach at openjdk.org Sat Mar 15 16:02:17 2025 From: liach at openjdk.org (Chen Liang) Date: Sat, 15 Mar 2025 16:02:17 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f Removing these two tags generated from `PreviewFeatures` and `JavaUtilCollectionAccess`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2726753687 From jrose at openjdk.org Sat Mar 15 23:54:01 2025 From: jrose at openjdk.org (John R Rose) Date: Sat, 15 Mar 2025 23:54:01 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f I'm surprised to see `@ForceInline` in the offset query functions in `Unsafe`. Those are not on any fast path I'm aware of. What use case does this annotation address? If none, consider deleting; it will be a future maintenance puzzle. Or at least document in a comment why a slow path function needs such an annotation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2727063749 From jrose at openjdk.org Sun Mar 16 00:37:03 2025 From: jrose at openjdk.org (John R Rose) Date: Sun, 16 Mar 2025 00:37:03 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f (I am reading the patch, not playing with javadoc or live API, so I might be mis-reading what's going on, so apologies in advance if this is beside the point.) Comments on visual noise and side effects in `toString`. `renderWrapped` is clever for a single stable value, but it makes for a very noisy display string, with confusing doubly-nested `[]`, for composite stable values. I'm talking about `StableFunction` mainly, I guess. I suggest omitting the inner `[]` for such composites. A simple boolean on `renderWrapped` will do that trick. In that case, `renderWrapped` has the job of either presenting a fixed (recognizable) sentinel string, or else forwards, without further editorial comment, to the `toString` of the contained value. I see that, probably due to prior `java.util` contracts, a stable list or map cannot present a `toString` with unset component values. A stable list or map uses a ?canned? `toString` method that calls `get`, which must force all component values to be evaluated before the `toString` can be printed. This may greatly annoy users of IDEs, which invoke `toString` (via JVMTI) to display program states. IDE users don?t expect mere observation of program states to change program states. This may be a blocker for some would-be adopters of `StableValue`. Just as `WeakHashMap` bends the `Map` API (regarding `equals`), I think `StableValue` composites should bend the `List` and `Map` APIs, regarding `toString`. Sometimes the contracts have to be bent for the whole design to fit together. I think the basic rule should be that the `toString` of stable-whatever should have a little "noise" around the outside, to show that it is not just a bare value, but wherever a wrapped value is, that wrapped value should be presented as directly as possible. Also, the wrapped value should not be forced, but rather a set, recognizable string (such as `` or `.unset`) should appear in place of the value string. At the very least, the presence of side effects in `toString`, an unusual condition, needs to be documented prominently, where applicable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2727094395 From jrose at openjdk.org Sun Mar 16 00:43:59 2025 From: jrose at openjdk.org (John R Rose) Date: Sun, 16 Mar 2025 00:43:59 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: <67zgwiCMxdlLGVECy3z81-p1EhUUp3BljMzuq7_lYEU=.424b64b6-ee08-41ef-98c1-88e0fc8be3d4@github.com> On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f (P.S. I do see how fixing `toString` is just the first inch or so of a long? string? of debugging issues. A JVMTI-based inspector is going to want to call `List::get` as well as `toString`.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2727096865 From john.r.rose at oracle.com Sun Mar 16 01:02:06 2025 From: john.r.rose at oracle.com (John Rose) Date: Sat, 15 Mar 2025 18:02:06 -0700 Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: <4B5XTEOv_30znhklVd9ymC0F4bmdn1bpYR-fYwtpvtY=.f0a2f7d9-0345-4893-a3f1-8559496b3c56@github.com> References: <4B5XTEOv_30znhklVd9ymC0F4bmdn1bpYR-fYwtpvtY=.f0a2f7d9-0345-4893-a3f1-8559496b3c56@github.com> Message-ID: <48649453-7926-4353-8AD7-A62EFD5AAF84@oracle.com> > Are deeper cycles of concern? I was thinking of this: There are a couple of ways existing java.util code handles self-cycles. The deepToString method handles them at all levels, so it is robust. But it is tricky and expensive. (Look at the variable named ?dejaVu?.) If you grep for /"(this / in the java.util sources, you will see several examples of a one-level exclusion of self-references. This is what the present PR emulates, and I think it is just fine to follow those precedents. It?s a 99% solution. Omitting the self-check would be a 90% solution, but the self-check is so simple that, why not do it? Adding a ?dejaVu? table is not simple; don?t. One thing I noticed, when doing that grep, is that the type name is usually ?detuned?. We should do that as well in this patch. For example, in Hashtable.java, the string says ?(this Map)? not ?(this Hashtable)?. The toString method tilts away from TMI. I think we have a slight TMI problem in this patch, maybe, and given the precedents I would expect ?(this Function)? not ?(this StableEnumFunction)?, etc. (TMI = Too Much Information. See also Gafter on the TMI temptation for language designers, which applies to API design as well: https://gafter.blogspot.com/2017/06/making-new-language-features-stand-out.html ) I see the PR?s unit tests look at this string. My humble take on it is, let?s dial back the TMI before it?s too late. (My sensitivity to TMI also informs other comments I made on this PR.) From jrose at openjdk.org Sun Mar 16 03:13:00 2025 From: jrose at openjdk.org (John R Rose) Date: Sun, 16 Mar 2025 03:13:00 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f The example code `class Component` uses a `Supplier` non-static field to hold a singular stable value, in a lambda which does not capture `this`. It may be best not to claim a solution (yet) for lazy non-statics, so consider making `Component.logger` static. I'm not saying you should attempt this in the present PR! But here are some related thoughts, for later. Solving the problem of lazy statics, using stable suppliers, is a big deal. It allows single-static-holder-classes to be refactored more cheaply. Here's why I think non-static lazy fields are not fully addressed by this PR. (A) Their initializer function should probably not capture `this`, for best performance, and (B) the VM's special casing of non-static fields of type `StableValue` does not extend to fields of type `Supplier`. (Nor should it.) But it seems we either need strict fields, or a JVM lockout of instance-supplier fields, to avoid problems with reflective object patching. You might consider a type `StableSupplier` <: `Supplier` to address (B), but unless (A) is addressed as well there is not an efficient end-to-end solution for lazy non-statics. (B) will also be addressed (independently) by strict statics. A full solution might require an indexed version of Supplier (i.e., a restricted Function) which takes `this` whenever needed to evaluate the lazily supplied value. So there's some cross-connection with the design of stable functions in this PR. (A stable function which caches one value seems useless, but it exactly fits the use case of a non-static lazy object attribute. The argument to the function is `this`. That's the only argument the instance-supplier will ever see. Every distinct instance of an instance-supplier will only ever see one instance of `this` as its argument value.) class Component { private strict final StableInstanceSupplier instanceSpecificLogger = StableValue.instanceSupplier( self -> Logger.getLogger(Component.class, self.getName() ); } In today's API you can get this effect but you have redundant bindings of `this` buried in the supplier structure. As other PR threads have pointed out, it's hard to drop the lambda after its invocation, so as to GC the storage of the bindings. OTOH, maybe there are more reasons we can't get rid of `this`. If so, maybe we have arrived at a final state for problem (A): the lambda has to capture `this`, so be it. But if that's true, we still want a story for problem (B) as well (non-strictness of instance fields). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2727157600 From john.r.rose at oracle.com Sun Mar 16 03:19:38 2025 From: john.r.rose at oracle.com (John Rose) Date: Sat, 15 Mar 2025 20:19:38 -0700 Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: <93E21D00-2C99-4278-A6BB-0578F32295F8@oracle.com> On 13 Mar 2025, at 4:20, Per Minborg wrote: > ? >> Reentrancy into here seems really buggy, I would endorse disallowing it >> instead. In that case, a `ReentrantLock` seems better than the native monitor as we can cheaply check `lock.isHeldByCurrentThread()` > > StableValueImpl was carefully designed to minimize memory footprint. Adding a lock would inflate memory usage substantially. +1 from me A similar level of concern with footprint was in my mind in my earlier comment, where I claimed that capturing /this/ in a lambda is suboptimal. The inefficiency is in object creation and footprint, since an extra copy of /this/ must be tracked. > >> src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 159: >> >>> 157: private boolean wrapAndCas(Object value) { >>> 158: // This upholds the invariant, a `@Stable` field is written to at most once >>> 159: return UNSAFE.compareAndSetReference(this, UNDERLYING_DATA_OFFSET, null, wrap(value)); >> >> There is no need for a cas here as all setters have to hold the lock. We should have a dedicated private `set` that asserts `Thread.holdsLock(this)`. > > This is more of a belt and suspenders solution. It is true that it is redundant. A set volatile would suffice here. There is a broad choice at the beginning of this design whether to use a mutex (as and ClassValue do) or use lock-free CAS (as condy/indy do). This API, which is more parallel to the higher-level and ClassValue, uses a mutex. The choice connects to the rules about handling races. Surely, two threads can ask concurrently for a SV state, and both may ?suggest? a lambda to give it a value. Now we come to a fork in the road: Do we select at most one lambda to run? Or, do we let both lambdas run and then pick a winner? The first requires a mutex. The second is lock-free and uses CAS. It?s a binary choice. I don?t think we ever need the belt and suspenders. I agree that StableValue is like ClassValue and not like condy, so it should not be playing with lock free stuff. (Or did I forget something??) ? John From john.r.rose at oracle.com Sun Mar 16 03:29:28 2025 From: john.r.rose at oracle.com (John Rose) Date: Sat, 15 Mar 2025 20:29:28 -0700 Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: <2AEBD2FF-0816-418A-B8A9-C936D942F4D3@oracle.com> A @Stable field does not need to be volatile. Avoiding volatile is one of the uses for @Stable. That said, @Stable is not as foolproof as volatile. It?s more dangerous, and cheaper. You have to do a release store to a stable variable. That?s what the VM does for you automatically for a final, and a stable is like a delayed final. But the VM does not give you the release store automatically; you must do it manually. That?s why @Stable is an internal feature, and StableValue is the ?housebroken? version of it. StableValue has to help the VM maintain the appearance of a ?final? variable whose initialization got delayed. The wrapAndSet method does this job. This might seem to contradict my previous assertion that StableValue, being mutex based, must not use lock-free idioms. That comment applies specifically to the update operation that takes a lambda. Other operations, such as reading a SV, or hopefully poking a value at a SV can be, and should be, composed of lock-free operations. Why take a lock when it?s just a one-word read or write? On 13 Mar 2025, at 9:07, Maurizio Cimadamore wrote: > On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: > >>> Implement JEP 502. >>> >>> The PR passes tier1-tier3 tests. >> >> Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: >> >> - Merge branch 'master' into implement-jep502 >> - Clean up exception messages and fix comments >> - Rename field >> - Rename method and fix comment >> - Rework reenterant logic >> - Use acquire semantics for reading rather than volatile semantics >> - Add missing null check >> - Simplify handling of sentinel, wrap, and unwrap >> - Fix JavaDoc issues >> - Fix members in StableEnumFunction >> - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f > > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 74: > >> 72: @Override >> 73: public boolean trySet(T value) { >> 74: if (wrappedContentAcquire() != null) { > > IMHO, if our goal is to do: > > Object content = this.content; > if (context != null) return content: > synchronized (...) { > if (context != null) return content: > this.context = ... > } > > > Then we might just use a volatile field and synchronized blocks. I don't see an immediate need for using acquire/release semantics -- > especially when using a monitor. E.g. this should look more like a classic double checked locking idiom. (but with a stable field to make the first volatile read more efficient in case the field is already set) > > ------------- > > PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1993850760 From alanb at openjdk.org Sun Mar 16 07:54:58 2025 From: alanb at openjdk.org (Alan Bateman) Date: Sun, 16 Mar 2025 07:54:58 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Sat, 15 Mar 2025 23:51:36 GMT, John R Rose wrote: > I'm surprised to see `@ForceInline` in the offset query functions in `Unsafe`. Those are not on any fast path I'm aware of. What use case does this annotation address? If none, consider deleting; it will be a future maintenance puzzle. Or at least document in a comment why a slow path function needs such an annotation. Looks like it was added as part of JDK-8149159 in JDK 9 as part of work to improve the argument checking. We have touched this area a few times (for deprecations and warnings mostly) but we missed it. Part of me wonders if the additional checks in Unsafe are really needed. The first use of offset methods will print a warning to say they are going to be removed in the future. If someone is really determined to hack on a SV then they will just exploit layout and bypass the check. We don't have any deterrent (aside from the first) warning for the more likely case of someone using the offsets to modify a static final. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2727248346 From duke at openjdk.org Sun Mar 16 13:31:02 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Sun, 16 Mar 2025 13:31:02 GMT Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic In-Reply-To: References: Message-ID: On Wed, 7 Feb 2024 14:35:55 GMT, Yuri Gaevsky wrote: > Hello All, > > Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported. > > Thank you, > -Yuri Gaevsky > > **Correctness checks:** > hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2727429464 From duke at openjdk.org Sun Mar 16 19:41:02 2025 From: duke at openjdk.org (Luca Kellermann) Date: Sun, 16 Mar 2025 19:41:02 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f src/java.base/share/classes/java/util/ImmutableCollections.java line 798: > 796: throw new IndexOutOfBoundsException(i); > 797: } > 798: } I think `orElseSet` should be outside of the `try` block, otherwise an `ArrayIndexOutOfBoundsException` thrown by `mapper.apply` will be wrapped. src/java.base/share/classes/java/util/ImmutableCollections.java line 1488: > 1486: final K k = (K) key; > 1487: return stable.orElseSet(new Supplier() { > 1488: @Override public V get() { return mapper.apply(k); }}); This can return `null` (`StableMap` does allow `null` values), so the `getOrDefault` implementation in `AbstractImmutableMap` does not properly work for `StableMap`: var map = StableValue.map(Set.of(1), _ -> null); // should print "null", but prints "default value" System.out.println(map.getOrDefault(1, "default value")); src/java.base/share/classes/jdk/internal/lang/stable/EmptyStableFunction.java line 47: > 45: @Override > 46: public R apply(T value) { > 47: throw new IllegalArgumentException("Input not allowed: " + value); `StableEnumFunction` and `StableFunction` throw a `NullPointerException` if `value` is `null`. src/java.base/share/classes/jdk/internal/lang/stable/StableEnumFunction.java line 68: > 66: } catch (ArrayIndexOutOfBoundsException ioob) { > 67: throw new IllegalArgumentException("Input not allowed: " + value, ioob); > 68: } Same here. src/java.base/share/classes/jdk/internal/lang/stable/StableIntFunction.java line 61: > 59: throw new IllegalArgumentException("Input not allowed: " + index, ioob); > 60: } > 61: } Same here. src/java.base/share/classes/jdk/internal/lang/stable/StableValueFactories.java line 77: > 75: int i = 0; > 76: for (K key : keys) { > 77: entries[i++] = Map.entry(key, StableValueImpl.newInstance()); `Map.entry` causes `null` keys to throw a `NullPointerException`, meaning there can't be stable functions/maps with a `null` input/key. They can however have `null` values. Is that intended? src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 132: > 130: // Prevent reentry > 131: if (Thread.holdsLock(this)) { > 132: throw new IllegalStateException("Recursing supplier detected: " + supplier); Is it right to include `supplier` in the message? The actual recursing supplier could be another one: var s = StableValue.of(); // throws java.lang.IllegalStateException: Recursing supplier detected: supplier 2 s.orElseSet(new Supplier<>() { public String toString() {return "supplier 1";} public Integer get() { s.orElseSet(new Supplier<>() { public String toString() {return "supplier 2";} public Integer get() {return 2;} }); return 1; } }); The exception message could also be confusing in cases like this: var s = StableValue.of(); synchronized (s) { // throws java.lang.IllegalStateException: Recursing supplier detected s.trySet(null); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1997660330 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1997685354 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1997681440 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1997660665 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1997660937 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1997680982 PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1997689245 From liach at openjdk.org Mon Mar 17 00:44:09 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 17 Mar 2025 00:44:09 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Sun, 16 Mar 2025 16:50:19 GMT, Luca Kellermann wrote: >> Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: >> >> - Merge branch 'master' into implement-jep502 >> - Clean up exception messages and fix comments >> - Rename field >> - Rename method and fix comment >> - Rework reenterant logic >> - Use acquire semantics for reading rather than volatile semantics >> - Add missing null check >> - Simplify handling of sentinel, wrap, and unwrap >> - Fix JavaDoc issues >> - Fix members in StableEnumFunction >> - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f > > src/java.base/share/classes/java/util/ImmutableCollections.java line 798: > >> 796: throw new IndexOutOfBoundsException(i); >> 797: } >> 798: } > > I think `orElseSet` should be outside of the `try` block, otherwise an `ArrayIndexOutOfBoundsException` thrown by `mapper.apply` will be wrapped. Even better, we should just do a `Preconditions.checkIndex` explicitly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1997792857 From liach at openjdk.org Mon Mar 17 00:47:01 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 17 Mar 2025 00:47:01 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f src/java.base/share/classes/jdk/internal/lang/stable/StableValueFactories.java line 71: > 69: } > 70: > 71: public static Map> map(Set keys) { I recommend choosing a different name from `map(Set, Function)` for navigation simplicitiy. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1997794773 From duke at openjdk.org Mon Mar 17 02:01:59 2025 From: duke at openjdk.org (Luca Kellermann) Date: Mon, 17 Mar 2025 02:01:59 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Sun, 16 Mar 2025 00:34:45 GMT, John R Rose wrote: > I see that, probably due to prior `java.util` contracts, a stable list or map cannot present a `toString` with unset component values. A stable list or map uses a ?canned? `toString` method that calls `get`, which must force all component values to be evaluated before the `toString` can be printed. I also noticed this issue of `toString` eagerly setting all elements of stable collections and agree that it probably shouldn't do this. Note that all views of these collections (obtained via `List.subList`, `List.reversed`, `Map.entrySet`, `Map.values`, etc.) would also need their own `toString` implementation. > Just as `WeakHashMap` bends the `Map` API (regarding `equals`), I think `StableValue` composites should bend the `List` and `Map` APIs, regarding `toString`. Sometimes the contracts have to be bent for the whole design to fit together. Neither `List`, `Set`, nor `Map` mention any requirements for `toString` in their interface specification. Only `AbstractCollection` and `AbstractMap` have a default implementation of `toString`. So I don't think any contract would be bent here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2727837224 From liach at openjdk.org Mon Mar 17 02:38:08 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 17 Mar 2025 02:38:08 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:36:08 GMT, Maurizio Cimadamore wrote: >> Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: >> >> - Merge branch 'master' into implement-jep502 >> - Clean up exception messages and fix comments >> - Rename field >> - Rename method and fix comment >> - Rework reenterant logic >> - Use acquire semantics for reading rather than volatile semantics >> - Add missing null check >> - Simplify handling of sentinel, wrap, and unwrap >> - Fix JavaDoc issues >> - Fix members in StableEnumFunction >> - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f > > src/hotspot/share/ci/ciField.cpp line 254: > >> 252: >> 253: static bool trust_final_non_static_fields_of_type(Symbol* signature) { >> 254: return signature == vmSymbols::java_lang_StableValue_signature(); > > Just a note that we will need to decide whether to keep this or not... We might change this to require stable values to be strict final instead if strict final is previewed at the same time as stable values - https://openjdk.org/jeps/8350458 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1997850924 From liach at openjdk.org Mon Mar 17 02:52:08 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 17 Mar 2025 02:52:08 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f src/java.base/share/classes/java/lang/StableValue.java line 47: > 45: * A stable value is a shallowly immutable holder of deferred content. > 46: *

> 47: * A {@linkplain StableValue {@code StableValue}} can be created using the factory This looks weird. I recommend doing `{@code StableValue}` as we are already in the StableValue class and this link won't go anywhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1997858327 From liach at openjdk.org Mon Mar 17 02:59:03 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 17 Mar 2025 02:59:03 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f src/java.base/share/classes/java/lang/StableValue.java line 56: > 54: * , {@linkplain #orElse(Object) orElse()}, or {@linkplain #orElseSet(Supplier) orElseSet()}. > 55: *

> 56: * A stable value that is set is treated as a constant by the JVM, enabling the Before promoting the "constant" features, I would prefer more details about the transition to set - that setting is really a racy thing and only one actor succeeds in setting, and specify the memory effect (like the final field freeze) as a hb for the set and all subsequent gets that successfully observed that set. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1997862275 From liach at openjdk.org Mon Mar 17 03:05:01 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 17 Mar 2025 03:05:01 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f src/java.base/share/classes/java/lang/StableValue.java line 358: > 356: * other hand are guaranteed not to synchronize on {@code this}. > 357: * > 358: * @implNote A {@linkplain StableValue} is mainly intended to be a non-public field in This should have been a piece of API Notes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r1997865152 From dholmes at openjdk.org Mon Mar 17 05:39:02 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 17 Mar 2025 05:39:02 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 09:49:44 GMT, Aleksey Shipilev wrote: >> This PR implements JEP 503: Remove the 32-bit x86 Port. >> >> The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. >> >> This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. >> >> The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. >> >> Additional testing: >> - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) >> - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Drop commented out block from deprecations > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Generic 32-bit x86 configure error supercedes Windows 32-bit x86 > - 8345169: Implement JEP 503: Remove the 32-bit x86 Port Apologies for the silence but I was out-of-action for several days and am still trying to catch up. Obviously different JEPs have modeled things differently and there is no one-right-way. A lot of follow-up tasks have been identified and no doubt there will be even more after that. I personally would have liked to see more of the known tasks counted as part of the JEP. Hopefully a bunch of them may be ready by the time the JEP is ready anyway. Hitting the Approve button to show "general consensus". ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23906#pullrequestreview-2689161146 From rehn at openjdk.org Mon Mar 17 07:58:50 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Mar 2025 07:58:50 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v3] In-Reply-To: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: > Hi please consider. > > |RVWMO| Patched| > | ---------- | ---------- | > |fence iorw,iorw| fence iorw,ow| > |sw t4,120(t2) | sw t4,120(t2) | > |fence ow,ir | unnecessary_membar_volatile_rvwmo | > | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | > |fence iorw,ow | fence iorw,ow| > |sw t5,124(t2) |sw t5,124(t2) | > > |TSO | Patched| > | ---------- | ---------- | > | lw a4,120(t2) | lw a6,120(t2) | > | sw a0,124(t2) | sw t6,124(t2) | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > | sw t4,120(t2) | sw t4,120(t2) | > | fence ow,ir | unnecessary_membar_volatile_tso | > | sw t6,128(t2) | sw t5,128(t2) | > | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > |... | ... | > | sw a3,120(t2) | sw a0,120(t2) | > | fence ow,ir | fence ow,ir | > | lw a7,124(t2) | lw a5,124(t2) | > > For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. > > The patch do: > - Separate ztso and rvwmo in ad by using UseZtso predicate. > - Match all that requires the same membar. > - Make fence/fencei protected as they shouldn't be using directly. > - Increased cost of membars to VOLATILE_REF_COST. > - Added a real_empty pipe. > - Change to pipe_slow on TSO (as x86). > > Note that C2-rv64 is now superior to gcc/clang regrading fencing: > https://godbolt.org/z/6E3YTP15j > > Testing jcstress, tier1 and manually reading the generated assembly. > Doing additional testing, but RFR it now as it may need some consideration. > > /Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into tso-merge - Review comments - Fixed ws - Revert NC - Fixed comment - UseNewCode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24035/files - new: https://git.openjdk.org/jdk/pull/24035/files/4279f9fc..a451ade2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=01-02 Stats: 3823 lines in 130 files changed: 2953 ins; 445 del; 425 mod Patch: https://git.openjdk.org/jdk/pull/24035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24035/head:pull/24035 PR: https://git.openjdk.org/jdk/pull/24035 From tschatzl at openjdk.org Mon Mar 17 08:00:06 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 17 Mar 2025 08:00:06 GMT Subject: Integrated: 8346194: Improve G1 pre-barrier C2 cost estimate In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 12:30:23 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that modifies pre-barrier node costs for loop-unrolling to only consider the fast path. The reasoning is similar to zgc (and the new costs as well): only the part of the barrier inlined into the main code stream, as the slow path is laid out separately and does/should not directly affect performance (particularly if there is no marking going on). > > There are no differences/impact in performance since the post barrier cost is still very large, which fill be fixed elsewhere. > > Testing: gha, perf testing standalone (neither micros nor actual benchmarks give any difference outside of variance), testing with JDK-8342382 > > Hth, > Thomas This pull request has now been integrated. Changeset: 9f8d833f Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/9f8d833f8654cb4280d002ef86ce3ae9d709eddc Stats: 11 lines in 1 file changed: 5 ins; 5 del; 1 mod 8346194: Improve G1 pre-barrier C2 cost estimate Co-authored-by: Roberto Casta?eda Lozano Reviewed-by: rcastanedalo, ayang ------------- PR: https://git.openjdk.org/jdk/pull/23862 From tschatzl at openjdk.org Mon Mar 17 08:00:05 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 17 Mar 2025 08:00:05 GMT Subject: RFR: 8346194: Improve G1 pre-barrier C2 cost estimate [v2] In-Reply-To: <9rZfd8Ncob8mKPrPNAUXYgd16GhvWF-TEBcKVa60isE=.477a43e6-3aec-4cea-b943-6c8ea157a7d1@github.com> References: <9rZfd8Ncob8mKPrPNAUXYgd16GhvWF-TEBcKVa60isE=.477a43e6-3aec-4cea-b943-6c8ea157a7d1@github.com> Message-ID: On Fri, 7 Mar 2025 08:47:13 GMT, Roberto Casta?eda Lozano wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano > > Looks good, thanks for addressing my feedback Thomas! > > Reducing the estimated GC barrier size could lead to over-unrolling, which might increase code cache pressure and ultimately affect performance. In practice, it seems that the effect of estimated GC barrier size on total code size is limited though: I studied the impact on C2-generated code size using DaCapo 23 on x64 and aarch64 and it is basically unaffected by this change. In the extreme case of estimating GC barrier size to be 0, the overall code size increase is of about 1% for x64 and 0.5% for aarch64. If each GC barrier (pre and post) is estimated to correspond to 20 nodes, the code size increase is further reduced to only 0.5% for x64 and 0.01% for aarch64. Beyond that, the code size increase becomes statistically insignificant. Thanks @robcasloz @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/23862#issuecomment-2728496581 From jbechberger at openjdk.org Mon Mar 17 10:03:00 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 17 Mar 2025 10:03:00 GMT Subject: RFR: 8342818: Implement CPU Time Profiling for JFR [v43] In-Reply-To: References: Message-ID: <12EY0qQHtcU6A5z5VstORM7kibUWrqQNtIGfC4tqvoI=.798f782f-fa25-4640-9f92-5c77030ed2ec@github.com> > This is the code for the [JEP draft: CPU Time based profiling for JFR]. > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Improve placement of NoResourceMark - Add more checks for metadata_do ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20752/files - new: https://git.openjdk.org/jdk/pull/20752/files/f1bb87f1..4c4496c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=41-42 Stats: 16 lines in 2 files changed: 9 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20752.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20752/head:pull/20752 PR: https://git.openjdk.org/jdk/pull/20752 From tschatzl at openjdk.org Mon Mar 17 10:32:33 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 17 Mar 2025 10:32:33 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v23] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/447fe39b..4d0afd57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=21-22 Stats: 16 lines in 7 files changed: 2 ins; 9 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From mgronlun at openjdk.org Mon Mar 17 10:34:58 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 17 Mar 2025 10:34:58 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v4] In-Reply-To: References: Message-ID: <4_YjJswj8ApiS3AOBpnqkpUBjMI7_yFNy268phmDQWQ=.8077086c-ecae-43c2-844a-6c2cac6f4f92@github.com> On Fri, 14 Mar 2025 18:17:40 GMT, Aleksey Shipilev wrote: > It is not like we cannot enable it by default. We just don't know yet what is the actual overhead of doing so. My initial thought was to not enable it by default to give us extra safety. Yes, this is what I meant with not being able to enable it by default. It can depend on a lot of things, overhead in performance, too much data, to much noise etc etc. In general, if we cannot enable an event by default, it is an indication there is something in the design that needs to be reconsidered - i.e. how could it be made to work by default? Throttling, caching, thresholds, others? That said, events that can help support and performance engineers to troubleshoot faster and reduce time to issue resolution has high value in and of itself, even if it means turning on something extra. I also like also your argument that we get the stacktrace of the notifier thread - that is valuable context indeed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23901#issuecomment-2728995679 From fyang at openjdk.org Mon Mar 17 11:19:57 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 17 Mar 2025 11:19:57 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v2] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Fri, 14 Mar 2025 11:38:19 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> |RVWMO| Patched| >> | ---------- | ---------- | >> |fence iorw,iorw| fence iorw,ow| >> |sw t4,120(t2) | sw t4,120(t2) | >> |fence ow,ir | unnecessary_membar_volatile_rvwmo | >> | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | >> |fence iorw,ow | fence iorw,ow| >> |sw t5,124(t2) |sw t5,124(t2) | >> >> |TSO | Patched| >> | ---------- | ---------- | >> | lw a4,120(t2) | lw a6,120(t2) | >> | sw a0,124(t2) | sw t6,124(t2) | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> | sw t4,120(t2) | sw t4,120(t2) | >> | fence ow,ir | unnecessary_membar_volatile_tso | >> | sw t6,128(t2) | sw t5,128(t2) | >> | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> |... | ... | >> | sw a3,120(t2) | sw a0,120(t2) | >> | fence ow,ir | fence ow,ir | >> | lw a7,124(t2) | lw a5,124(t2) | >> >> For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. >> >> The patch do: >> - Separate ztso and rvwmo in ad by using UseZtso predicate. >> - Match all that requires the same membar. >> - Make fence/fencei protected as they shouldn't be using directly. >> - Increased cost of membars to VOLATILE_REF_COST. >> - Added a real_empty pipe. >> - Change to pipe_slow on TSO (as x86). >> >> Note that C2-rv64 is now superior to gcc/clang regrading fencing: >> https://godbolt.org/z/6E3YTP15j >> >> Testing jcstress, tier1 and manually reading the generated assembly. >> Doing additional testing, but RFR it now as it may need some consideration. >> >> /Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Review comments src/hotspot/cpu/riscv/riscv.ad line 7888: > 7886: // ZTSO > 7887: > 7888: instruct no_membar_rvtso() %{ Can we rename this as `unnecessary_membar_rvtso` which is similar to other names like `unnecessary_membar_volatile_rvtso`? I personally don't really like the `no_` prefix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r1997834690 From fyang at openjdk.org Mon Mar 17 11:19:55 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 17 Mar 2025 11:19:55 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v3] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: <8EaExbLjckRIJfcgCf0VcHFR17An5g4q5Q9WVEY_QAM=.34fd6917-fd0f-46d9-9906-5fcafdab705c@github.com> On Mon, 17 Mar 2025 07:58:50 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> |RVWMO| Patched| >> | ---------- | ---------- | >> |fence iorw,iorw| fence iorw,ow| >> |sw t4,120(t2) | sw t4,120(t2) | >> |fence ow,ir | unnecessary_membar_volatile_rvwmo | >> | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | >> |fence iorw,ow | fence iorw,ow| >> |sw t5,124(t2) |sw t5,124(t2) | >> >> |TSO | Patched| >> | ---------- | ---------- | >> | lw a4,120(t2) | lw a6,120(t2) | >> | sw a0,124(t2) | sw t6,124(t2) | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> | sw t4,120(t2) | sw t4,120(t2) | >> | fence ow,ir | unnecessary_membar_volatile_tso | >> | sw t6,128(t2) | sw t5,128(t2) | >> | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> |... | ... | >> | sw a3,120(t2) | sw a0,120(t2) | >> | fence ow,ir | fence ow,ir | >> | lw a7,124(t2) | lw a5,124(t2) | >> >> For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. >> >> The patch do: >> - Separate ztso and rvwmo in ad by using UseZtso predicate. >> - Match all that requires the same membar. >> - Make fence/fencei protected as they shouldn't be using directly. >> - Increased cost of membars to VOLATILE_REF_COST. >> - Added a real_empty pipe. >> - Change to pipe_slow on TSO (as x86). >> >> Note that C2-rv64 is now superior to gcc/clang regrading fencing: >> https://godbolt.org/z/6E3YTP15j >> >> Testing jcstress, tier1 and manually reading the generated assembly. >> Doing additional testing, but RFR it now as it may need some consideration. >> >> /Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into tso-merge > - Review comments > - Fixed ws > - Revert NC > - Fixed comment > - UseNewCode Hi, Thanks for the update. I have checked the changes, seems fine to me modulo several minor comments about naming. And I having been running jcstress on two of my OoO machines over the weekend. So far so good. I will let the test continue for some more time to see. src/hotspot/cpu/riscv/riscv.ad line 7967: > 7965: size(0); > 7966: > 7967: format %{ "no_membar_rvtso elided/tso (empty encoding)" %} Here: s/no_membar_rvtso/unnecessary_membar_rvtso/ src/hotspot/cpu/riscv/riscv.ad line 7969: > 7967: format %{ "no_membar_rvtso elided/tso (empty encoding)" %} > 7968: ins_encode %{ > 7969: __ block_comment("no_membar_rvtso"); And here: s/no_membar_rvtso/unnecessary_membar_rvtso/ src/hotspot/cpu/riscv/riscv.ad line 8007: > 8005: // RVWMO > 8006: > 8007: instruct membar_rvwmo_aqcuire() %{ Maybe it's better to put `_rvwmo` as a suffix? Like `membar_aqcuire_rvwmo`. Similar for other match rules like `membar_rvwmo_release`, `membar_rvwmo_storestore`, `membar_rvwmo_lock` and `membar_rvwmo_volatile`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24035#pullrequestreview-2688965550 PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r1998503604 PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r1998503875 PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r1998510917 From duke at openjdk.org Mon Mar 17 11:38:04 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 17 Mar 2025 11:38:04 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v6] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 15:34:18 GMT, Leonid Mesnik wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Added validity test for the intrinsics. > > test/jdk/sun/security/provider/acvp/Launcher.java line 43: > >> 41: * @modules java.base/sun.security.provider >> 42: * @run main Launcher >> 43: * @run main/othervm -Xcomp Launcher > > Thank you for adding this case. Please add it as a separate testcase: > /* > * @test > * @summary Test verifies intrinsic implementation. > * @library /test/lib > * @modules java.base/sun.security.provider > * @run main/othervm -Xcomp Launcher > */ Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1998545085 From rehn at openjdk.org Mon Mar 17 12:35:42 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Mar 2025 12:35:42 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v4] In-Reply-To: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: > Hi please consider. > > |RVWMO| Patched| > | ---------- | ---------- | > |fence iorw,iorw| fence iorw,ow| > |sw t4,120(t2) | sw t4,120(t2) | > |fence ow,ir | unnecessary_membar_volatile_rvwmo | > | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | > |fence iorw,ow | fence iorw,ow| > |sw t5,124(t2) |sw t5,124(t2) | > > |TSO | Patched| > | ---------- | ---------- | > | lw a4,120(t2) | lw a6,120(t2) | > | sw a0,124(t2) | sw t6,124(t2) | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > | sw t4,120(t2) | sw t4,120(t2) | > | fence ow,ir | unnecessary_membar_volatile_tso | > | sw t6,128(t2) | sw t5,128(t2) | > | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > |... | ... | > | sw a3,120(t2) | sw a0,120(t2) | > | fence ow,ir | fence ow,ir | > | lw a7,124(t2) | lw a5,124(t2) | > > For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. > > The patch do: > - Separate ztso and rvwmo in ad by using UseZtso predicate. > - Match all that requires the same membar. > - Make fence/fencei protected as they shouldn't be using directly. > - Increased cost of membars to VOLATILE_REF_COST. > - Added a real_empty pipe. > - Change to pipe_slow on TSO (as x86). > > Note that C2-rv64 is now superior to gcc/clang regrading fencing: > https://godbolt.org/z/6E3YTP15j > > Testing jcstress, tier1 and manually reading the generated assembly. > Doing additional testing, but RFR it now as it may need some consideration. > > /Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24035/files - new: https://git.openjdk.org/jdk/pull/24035/files/a451ade2..566c0a76 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=02-03 Stats: 19 lines in 1 file changed: 0 ins; 1 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/24035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24035/head:pull/24035 PR: https://git.openjdk.org/jdk/pull/24035 From rehn at openjdk.org Mon Mar 17 12:35:42 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Mar 2025 12:35:42 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v2] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Mon, 17 Mar 2025 02:06:46 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > src/hotspot/cpu/riscv/riscv.ad line 7888: > >> 7886: // ZTSO >> 7887: >> 7888: instruct no_membar_rvtso() %{ > > Can we rename this as `unnecessary_membar_rvtso` which is similar to other names like `unnecessary_membar_volatile_rvtso`? I personally don't really like the `no_` prefix. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r1998631550 From rehn at openjdk.org Mon Mar 17 12:35:43 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Mar 2025 12:35:43 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v3] In-Reply-To: <8EaExbLjckRIJfcgCf0VcHFR17An5g4q5Q9WVEY_QAM=.34fd6917-fd0f-46d9-9906-5fcafdab705c@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> <8EaExbLjckRIJfcgCf0VcHFR17An5g4q5Q9WVEY_QAM=.34fd6917-fd0f-46d9-9906-5fcafdab705c@github.com> Message-ID: <3_nfwf03-Sis_sfAUFc7etPr6wjqovkkiiBl6ASchjI=.ccceb109-5261-43a6-84d0-6186c4c4090e@github.com> On Mon, 17 Mar 2025 11:07:30 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into tso-merge >> - Review comments >> - Fixed ws >> - Revert NC >> - Fixed comment >> - UseNewCode > > src/hotspot/cpu/riscv/riscv.ad line 7967: > >> 7965: size(0); >> 7966: >> 7967: format %{ "no_membar_rvtso elided/tso (empty encoding)" %} > > Here: s/no_membar_rvtso/unnecessary_membar_rvtso/ Fixed > src/hotspot/cpu/riscv/riscv.ad line 8007: > >> 8005: // RVWMO >> 8006: >> 8007: instruct membar_rvwmo_aqcuire() %{ > > Maybe it's better to put `_rvwmo` as a suffix? Like `membar_aqcuire_rvwmo`. Similar for other match rules like `membar_rvwmo_release`, `membar_rvwmo_storestore`, `membar_rvwmo_lock` and `membar_rvwmo_volatile`. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r1998631693 PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r1998631992 From jwaters at openjdk.org Mon Mar 17 12:51:12 2025 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 17 Mar 2025 12:51:12 GMT Subject: RFR: 8342769: HotSpot Windows/gcc port is broken [v16] In-Reply-To: References: Message-ID: On Mon, 17 Feb 2025 08:27:56 GMT, Julian Waters wrote: >> Several areas in HotSpot are broken in the gcc port. These, with the exception of 1 rather big oversight within SharedRuntime::frem and SharedRuntime::drem, are all minor correctness issues within the code. These mostly can be fixed with simple changes to the code. Note that I am not sure whether the SharedRuntime::frem and SharedRuntime::drem fix is correct. It may be that they can be removed entirely > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: > > - CAST_FROM_FN_PTR in os_windows.cpp > - Merge branch 'master' into hotspot > - Merge branch 'openjdk:master' into hotspot > - _WINDOWS && AARCH64 in sharedRuntime.hpp > - AARCH64 in sharedRuntimeRem.cpp > - Refactor sharedRuntime.cpp > - CAST_FROM_FN_PTR in os_windows.cpp > - Merge branch 'openjdk:master' into hotspot > - fmod_winarm64 in sharedRuntime.cpp > - fmod_winarm64 in sharedRuntimeRem.cpp > - ... and 19 more: https://git.openjdk.org/jdk/compare/5e9d72e2...3f9ca206 Keep open, will integrate soon ------------- PR Comment: https://git.openjdk.org/jdk/pull/21627#issuecomment-2729373397 From shade at openjdk.org Mon Mar 17 13:48:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Mar 2025 13:48:56 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v4] In-Reply-To: <4_YjJswj8ApiS3AOBpnqkpUBjMI7_yFNy268phmDQWQ=.8077086c-ecae-43c2-844a-6c2cac6f4f92@github.com> References: <4_YjJswj8ApiS3AOBpnqkpUBjMI7_yFNy268phmDQWQ=.8077086c-ecae-43c2-844a-6c2cac6f4f92@github.com> Message-ID: On Mon, 17 Mar 2025 10:31:56 GMT, Markus Gr?nlund wrote: > That said, events that can help support and performance engineers to troubleshoot faster and reduce time to issue resolution has high value in and of itself, even if it means turning on something extra. Yes. I think we are passing this bar with new event. We don't have to enable it by default, unless we feel strongly about it. I suppose if we find it very useful in the field, we can later argue to turn it on by default and do some performance measurements for its impact. I take it you are fine with this PR then, @mgronlun? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23901#issuecomment-2729573953 From mgronlun at openjdk.org Mon Mar 17 14:18:54 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 17 Mar 2025 14:18:54 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v5] In-Reply-To: <6HSZ5L4LjfEDdiQeKftFHlEMv3iRX68H2ZouhnQyV2c=.40e3a8f1-acbf-443f-a73f-06a55b0a8929@github.com> References: <6HSZ5L4LjfEDdiQeKftFHlEMv3iRX68H2ZouhnQyV2c=.40e3a8f1-acbf-443f-a73f-06a55b0a8929@github.com> Message-ID: On Thu, 13 Mar 2025 09:44:04 GMT, Aleksey Shipilev wrote: >> We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. >> >> Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Only emit event when notification happened > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Rewrite test to RecordingStream > - Drop threshold to 0ms > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Disable by default > - Fix I would like to step through it first. I will do that now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23901#issuecomment-2729681569 From cnorrbin at openjdk.org Mon Mar 17 14:47:36 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 17 Mar 2025 14:47:36 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v15] In-Reply-To: References: Message-ID: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: Allow non-debug verify_self + comparator readability ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23416/files - new: https://git.openjdk.org/jdk/pull/23416/files/0a92e60b..ac277b42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=13-14 Stats: 69 lines in 3 files changed: 34 ins; 16 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From mli at openjdk.org Mon Mar 17 14:48:33 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Mar 2025 14:48:33 GMT Subject: RFR: 8352159: RISC-V: add zfa support for loadConH Message-ID: Hi, Can you help to review this patch? Previously in https://github.com/openjdk/jdk/pull/23844, `loadConH` is implemented only with `flh`, but `fli_h` should be more efficient if Zfa is supported. Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24081/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24081&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352159 Stats: 89 lines in 4 files changed: 76 ins; 1 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24081.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24081/head:pull/24081 PR: https://git.openjdk.org/jdk/pull/24081 From mli at openjdk.org Mon Mar 17 15:13:17 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Mar 2025 15:13:17 GMT Subject: RFR: 8352159: RISC-V: add more zfa support [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > > Previously in https://github.com/openjdk/jdk/pull/23844, `loadConH` is implemented only with `flh`, but `fli_h` should be more efficient if Zfa is supported; min/max for HF could use fmin/maxm.h instead too. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: min/max ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24081/files - new: https://git.openjdk.org/jdk/pull/24081/files/e4636ecb..2b244afc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24081&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24081&range=00-01 Stats: 40 lines in 2 files changed: 39 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24081.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24081/head:pull/24081 PR: https://git.openjdk.org/jdk/pull/24081 From lmesnik at openjdk.org Mon Mar 17 16:10:25 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 17 Mar 2025 16:10:25 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v7] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 19:19:08 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Made the intrinsics test separate from the pure java test. Test changes looks good. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23860#pullrequestreview-2691165965 From mli at openjdk.org Mon Mar 17 18:58:18 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Mar 2025 18:58:18 GMT Subject: RFR: 8352159: RISC-V: add more zfa support [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > > Previously in https://github.com/openjdk/jdk/pull/23844, `loadConH` is implemented only with `flh`, but `fli_h` should be more efficient if Zfa is supported; min/max for HF could use fmin/maxm.h instead too. > > Thanks! Hamlin Li has updated the pull request incrementally with three additional commits since the last revision: - fix fli_h - fix rFlagsReg - refactor min/max ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24081/files - new: https://git.openjdk.org/jdk/pull/24081/files/2b244afc..ec9f4725 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24081&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24081&range=01-02 Stats: 42 lines in 2 files changed: 17 ins; 10 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24081.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24081/head:pull/24081 PR: https://git.openjdk.org/jdk/pull/24081 From jiangli at openjdk.org Mon Mar 17 19:43:04 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 17 Mar 2025 19:43:04 GMT Subject: RFR: 8352098: -Xrunjdwp fails on static JDK Message-ID: <7AZV4Zx94USyWGyA2H3pu6-8Vo-ZM5or6AinLiC5MtM=.b770bcf8-00c0-4f73-a204-8009b6c70e26@github.com> Please review this fix that avoids `JvmtiAgent::convert_xrun_agent` from prematurely exiting VM if `lookup_On_Load_entry_point` cannot load the agent using `JVM_OnLoad` symbol. Thanks `lookup_On_Load_entry_point` first tries to load the builtin agent from the executable by checking the requested symbol (`JVM_OnLoad`). If no builtin agent is found, it then tries to load the agent shared library (e.g. `libjdwp.so`) by calling `load_library`. The issue is that `load_library` is called with `vm_exit_on_error` set to `true`, which causes the VM to exit immediately if the agent shared library is not loaded. Therefore, `JvmtiAgent::convert_xrun_agent` has no chance to try loading the agent using `Agent_OnLoad` symbol (https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/prims/jvmtiAgent.cpp#L352). This is a hidden issue on regular JDK, since the `load_library` can successfully find the agent shared library when `JvmtiAgent::convert_xrun_agent` first tries to load the agent using `JVM_OnLoad` symbol. The issue is noticed on static JDK as there is no `libjdwp.so` in static JDK. It can be reproduced with jtreg `runtime/6294277/Sourc eDebugExtension.java` test. As part of the fix, I cleaned up following in `invoke_JVM_OnLoad` and `invoke_Agent_OnLoad`. If there's an error, the VM should already have exited during `lookup__OnLoad_entry_point` in those cases. if (on_load_entry == nullptr) { vm_exit_during_initialization("Could not find ... function in -Xrun library", agent->name()); } ------------- Commit messages: - Add 'assert(on_load_entry != nullptr, "invariant");' - Cleanup - Return nullptr if lookup_On_Load_entry_point does not load agent from executable or library. - When called from JvmtiAgent::convert_xrun_agent, don't report any error and bail out too early in lookup_JVM_OnLoad_entry_point if it does not succeed, since we want to try lookup_Agent_OnLoad_entry_point for Agent_OnLoad as well. Changes: https://git.openjdk.org/jdk/pull/24086/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24086&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352098 Stats: 34 lines in 1 file changed: 12 ins; 5 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/24086.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24086/head:pull/24086 PR: https://git.openjdk.org/jdk/pull/24086 From vpaprotski at openjdk.org Mon Mar 17 21:49:12 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 17 Mar 2025 21:49:12 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v7] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 19:19:08 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Made the intrinsics test separate from the pure java test. Partial review, just didnt want to sit on comments for this long. (Spent quite a bit of time catching up on papers and math required) The biggest roadblock I have following the code are raw register numbers. (And more comments? perhaps I need more math knowledge, but comments would help too). Also, 'hidden variables' (xmm30). Can't complain, because this is exactly what Vladimir Ivanov told me to do on my first PR https://github.com/openjdk/jdk/pull/10582#discussion_r1022185591 Perhaps that discussion applies here too. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 45: > 43: // Constants > 44: // > 45: ATTRIBUTE_ALIGNED(64) static const uint32_t dilithiumAvx512Consts[] = { This is really nitpicking.. but could had loaded constants inline with `movl` without requiring an ExternalAddress()? Nice to have constants together, only complaint is we have 'magic offsets' in ASM to reach in for particular one.. This one isnt too bad, offset of 32bits is easy to inspect visually (`dilithiumAvx512ConstsAddr()` could take a parameter perhaps) src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 58: > 56: > 57: ATTRIBUTE_ALIGNED(64) static const uint32_t dilithiumAvx512Perms[] = { > 58: // collect montmul results into the destination register same as `dilithiumAvx512Consts()`, 'magic offsets'; except here they are harder to count (eg. not clear visually what is the offset of `ntt inverse`). Could be split into three constant arrays to make the compiler count for us src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 127: > 125: for (int i = 0; i < parCnt; i++) { > 126: __ evpsubd(xmm(i + outputReg), k0, xmm(i + scratchReg1), xmm(i + scratchReg2), false, Assembler::AVX_512bit); > 127: } This is such a deceptively brilliant function!!! Took me a while to understand (and map to Java `montMul` function). Perhaps needs more comments. The comment on line 99 does provide good hints, but I still had some trouble. I ended up annotating a copy quite a bit. I do think all 'clever code' needs comments. Here is my annotated version, if you want to copy out anything: static void montmulEven2(XMMRegister outputReg, XMMRegister inputReg1, XMMRegister inputReg2, XMMRegister scratchReg1, XMMRegister scratchReg2, XMMRegister montQInvModR, XMMRegister dilithium_q, int parCnt, MacroAssembler* _masm) { int output = outputReg->encoding(); int input1 = inputReg1->encoding(); int input2 = inputReg2->encoding(); int scratch1 = scratchReg1->encoding(); int scratch2 = scratchReg2->encoding(); for (int i = 0; i < parCnt; i++) { // scratch1 = (int64)input1_even*input2_even // Java: long a = (long) b * (long) c; __ vpmuldq(xmm(i + scratch1), xmm(i + input1), xmm((input2 == 29) ? 29 : input2 + i), Assembler::AVX_512bit); } for (int i = 0; i < parCnt; i++) { // scratch2 = int32(montQInvModR*(int32)scratch1) // Java: int aLow = (int) a; // Java: int m = MONT_Q_INV_MOD_R * aLow; // signed low product __ vpmulld(xmm(i + scratch2), xmm(i + scratch1), montQInvModR, Assembler::AVX_512bit); } for (int i = 0; i < parCnt; i++) { // scratch2 = (int64)scratch2_even*dilithium_q_even // Java: ((long)m * MONT_Q) __ vpmuldq(xmm(i + scratch2), xmm(i + scratch2), dilithium_q, Assembler::AVX_512bit); } for (int i = 0; i < parCnt; i++) { // output_odd = scratch1_odd - scratch2_odd // Java: (aHigh - (int) (("scratch2") >> MONT_R_BITS)) __ evpsubd(xmm(i + output), k0, xmm(i + scratch1), xmm(i + scratch2), false, Assembler::AVX_512bit); } } - add comment that input2 can be xmm29, treated as constants, not consecutive (i.e. zetas) - Candidate for ascii art, even/odd columns, implicit int/long casts (or more 'math' comments on what happens) - use XMMRegisters instead of numbers (improve callsite readability) - can use either `inputReg1 = inputReg1->successor()` - or get `encoding()` and keep current style - could be static (local) function (hide from header), then pass _masm - pass all registers used (helps seeing register allocation, confirm no overlaps) False trails (i.e. nothing to do, but I thought about it already, so other reviewer doesnt have to?) - (ignore: worse performance) squash into a single for loop, let cpu do out-of-order (and improve readability) - xmm30/xmm31 (montQInvModR/dilithium_q) are constant. At a glance, it looks like they should be combined into one precomputed one. And paper 039.pdf suggests merging constants precompute the product; but.. different constants and looking at Java, there are several implicit casts For reductions of products inside the NTT this is not a problem because one has to multiply by the roots of unity which are compile-time constants. So one can just precompute them with an additional factor of ? mod q so that the results after Montgomery reduction are in fact congruent to the desired value a src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 140: > 138: __ vpmuldq(xmm(scratchReg1 + 1), xmm(inputReg12), xmm(inputReg2 + 1), Assembler::AVX_512bit); > 139: __ vpmuldq(xmm(scratchReg1 + 2), xmm(inputReg13), xmm(inputReg2 + 2), Assembler::AVX_512bit); > 140: __ vpmuldq(xmm(scratchReg1 + 3), xmm(inputReg14), xmm(inputReg2 + 3), Assembler::AVX_512bit); Another option for these four lines, to keep the style of rest of function int inputReg1[] = {inputReg11, inputReg12, inputReg13, inputReg14}; for (int i = 0; i < parCnt; i++) { __ vpmuldq(xmm(scratchReg1 + i), inputReg1[i], xmm(inputReg2 + i), Assembler::AVX_512bit); } src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 197: > 195: > 196: // level 0 > 197: montmulEven(20, 8, 29, 20, 16, 4); It would improve readability to know which parameter is a register, and which is a count.. i.e. `montmulEven(xmm20, xmm8, xmm29, xmm20, xmm16, 4);` (its not _that_ bad, once I remember that its always the last parameter.. but it does add to the 'mental load' one has to carry, and this code is already interesting enough) src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 980: > 978: // Dilithium multiply polynomials in the NTT domain. > 979: // Implements > 980: // static int implDilithiumNttMult( I suppose no java changes in this PR, but I notice that the inputs are all assumed to have fixed size. Most/all intrinsics I worked with had some sort of guard (eg `Objects.checkFromIndexSize`) right before the intrinsic java call. (It usually looks like it can be optimized away). But I notice no such guard here on the java side. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1010: > 1008: __ vpbroadcastd(xmm31, Address(dilithiumConsts, 4), Assembler::AVX_512bit); // q > 1009: __ vpbroadcastd(xmm29, Address(dilithiumConsts, 12), Assembler::AVX_512bit); // 2^64 mod q > 1010: __ evmovdqul(xmm28, Address(perms, 0), Assembler::AVX_512bit); - use of `c_rarg3` is 'clever' so probably should have a comment (ie. 'no 3rd parameter, free register') - Alternatively, load directly into the vector with `ExternalAddress()`; you need a scratch register (use r10) but address is close enough, it actually wont be used. Here is the disassembly I got: StubRoutines::dilithiumNttMult [0x00007f414fb68280, 0x00007f414fb68548] (712 bytes) -------------------------------------------------------------------------------- add %al,(%rax) 0x00007f414fb68280: push %rbp 0x00007f414fb68281: mov %rsp,%rbp 0x00007f414fb68284: vpbroadcastd 0x18f9fe32(%rip),%zmm30 # 0x00007f4168b080c0 0x00007f414fb6828e: vpbroadcastd 0x18f9fe2c(%rip),%zmm31 # 0x00007f4168b080c4 0x00007f414fb68298: vpbroadcastd 0x18f9fe2a(%rip),%zmm29 # 0x00007f4168b080cc 0x00007f414fb682a2: vmovdqu32 0x18f9f8d4(%rip),%zmm28 # 0x00007f4168b07b80 ``` The `ExternalAddress()` calls for above assembler ``` const Register scratch = r10; const XMMRegister montRSquareModQ = xmm29; const XMMRegister montQInvModR = xmm30; const XMMRegister dilithium_q = xmm31; const XMMRegister perms = xmm28; __ vpbroadcastd(montQInvModR, ExternalAddress(dilithiumAvx512ConstsAddr()), Assembler::AVX_512bit, scratch); // q^-1 mod 2^32 __ vpbroadcastd(dilithium_q, ExternalAddress(dilithiumAvx512ConstsAddr() + 4), Assembler::AVX_512bit, scratch); // q __ vpbroadcastd(montRSquareModQ, ExternalAddress(dilithiumAvx512ConstsAddr() + 12), Assembler::AVX_512bit, scratch); // 2^64 mod q __ evmovdqul(perms, k0, ExternalAddress(dilithiumAvx512PermsAddr()), false, Assembler::AVX_512bit, scratch); (and `dilithiumAvx512ConstsAddr(offset)` cound take an int parameter too) src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1012: > 1010: __ evmovdqul(xmm28, Address(perms, 0), Assembler::AVX_512bit); > 1011: > 1012: __ movl(len, 4); Compile-time constant, why not 'unroll at compile time'? i.e. wrap this loop with `for (int len=0; len<4; len++)` instead? src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1041: > 1039: for (int i = 0; i < 4; i++) { > 1040: __ evmovdqul(Address(result, i * 64), xmm(i), Assembler::AVX_512bit); > 1041: } This is nice, compact and clean. The biggest issue I have with following this code is really with all the 'raw' registers. I would much rather prefer symbolic names, but up to you to decide style. I ended up 'annotating' this snippet, so I could understand it and confirm everything.. as with montmulEven, hope some of it can be useful to you to copy out. XMMRegister POLY1[] = {xmm0, xmm1, xmm2, xmm3}; XMMRegister POLY2[] = {xmm4, xmm5, xmm6, xmm7}; XMMRegister SCRATCH1[] = {xmm12, xmm13, xmm14, xmm15}; XMMRegister SCRATCH2[] = {xmm16, xmm17, xmm18, xmm19}; XMMRegister SCRATCH3[] = {xmm8, xmm9, xmm10, xmm11}; for (int i = 0; i < 4; i++) { __ evmovdqul(POLY1[i], Address(poly1, i * 64), Assembler::AVX_512bit); __ evmovdqul(POLY2[i], Address(poly2, i * 64), Assembler::AVX_512bit); } // montmulEven: inputs are in even columns and output is in odd columns // scratch3_even = poly2_even*montRSquareModQ // poly2 to montgomery domain montmulEven2(SCRATCH3[0], POLY2[0], montRSquareModQ, SCRATCH1[0], SCRATCH2[0], montQInvModR, dilithium_q, 4, _masm); for (int i = 0; i < 4; i++) { // swap even/odd; 0xB1 == 2-3-0-1 __ vpshufd(SCRATCH3[i], SCRATCH3[i], 0xB1, Assembler::AVX_512bit); } // scratch3_odd = poly1_even*scratch3_even = poly1_even*poly2_even*montRSquareModQ montmulEven2(SCRATCH3[0], POLY1[0], SCRATCH3[0], SCRATCH1[0], SCRATCH2[0], 4, montQInvModR, dilithium_q, 4, _masm); for (int i = 0; i < 4; i++) { __ vpshufd(POLY1[i], POLY1[i], 0xB1, Assembler::AVX_512bit); __ vpshufd(POLY2[i], POLY2[i], 0xB1, Assembler::AVX_512bit); } // poly2_even = poly2_odd*montRSquareModQ // poly2 to montgomery domain montmulEven2(POLY2[0], POLY2[0], montRSquareModQ, SCRATCH1[0], SCRATCH2[0], 4, montQInvModR, dilithium_q, 4, _masm); for (int i = 0; i < 4; i++) { __ vpshufd(POLY2[i], POLY2[i], 0xB1, Assembler::AVX_512bit); } // poly1_odd = poly1_even*poly2_even montmulEven2(POLY1[0], POLY1[0], POLY2[0], SCRATCH1[0], SCRATCH2[0], 4, montQInvModR, dilithium_q, 4, _masm); for (int i = 0; i < 4; i++) { // result is scrambled between scratch3_odd and poly1_odd; unscramble __ evpermt2d(POLY1[i], perms, SCRATCH3[i], Assembler::AVX_512bit); } for (int i = 0; i < 4; i++) { __ evmovdqul(Address(result, i * 64), POLY1[i], Assembler::AVX_512bit); } With symbolic variable names, code was much easier to follow conceptually. Also has the side benefit of making it obvious which XMM registers are used and that there is no conflicts src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1090: > 1088: __ evpbroadcastd(xmm29, constant, Assembler::AVX_512bit); // constant multiplier > 1089: > 1090: __ movl(len, 2); Same comment here as the `generate_dilithiumNttMult_avx512` - constants can be loaded directly into XMM - len can be removed by unrolling at compile time - symbolic names could be used for registers - comments could be added ------------- PR Review: https://git.openjdk.org/jdk/pull/23860#pullrequestreview-2665370975 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1999468929 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1999471763 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1999625933 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1992230295 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1992235625 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1999712200 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1999413007 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1999367607 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1999683384 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1999686631 From vpaprotski at openjdk.org Mon Mar 17 21:49:14 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 17 Mar 2025 21:49:14 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v5] In-Reply-To: <3bphXKLpIpxAZP-FEOeob6AaHbv0BAoEceJka64vMW8=.3e4f74e0-9479-4926-b365-b08d8d702692@github.com> References: <3bphXKLpIpxAZP-FEOeob6AaHbv0BAoEceJka64vMW8=.3e4f74e0-9479-4926-b365-b08d8d702692@github.com> Message-ID: On Thu, 6 Mar 2025 17:37:33 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Accepted review comments. src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 494: > 492: address generate_sha3_implCompress(StubGenStubId stub_id); > 493: > 494: address generate_double_keccak(); you can hide internal helper functions (i.e. `montmulEven(*)`) if you wish. The trick is to add `MacroAssembler* _masm` as a parameter to the static (local) function. Its a trick I use to keep header clean, but still have plenty of helpers src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 409: > 407: __ evmovdquq(xmm29, Address(permsAndRots, 768), Assembler::AVX_512bit); > 408: __ evmovdquq(xmm30, Address(permsAndRots, 832), Assembler::AVX_512bit); > 409: __ evmovdquq(xmm31, Address(permsAndRots, 896), Assembler::AVX_512bit); Matter of taste, but I liked the compactness of montmulEven; i.e. for (i=0; i<15; i++) __ evmovdquq(xmm(17+i), Address(permsAndRots, 64*i), Assembler::AVX_512bit); src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 426: > 424: __ subl( roundsLeft, 1); > 425: > 426: __ evmovdquw(xmm5, xmm0, Assembler::AVX_512bit); Is there a pattern here; that can be 'compacted' into a loop? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983903347 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983935964 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983937154 From mgronlun at openjdk.org Mon Mar 17 22:03:08 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 17 Mar 2025 22:03:08 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v5] In-Reply-To: <6HSZ5L4LjfEDdiQeKftFHlEMv3iRX68H2ZouhnQyV2c=.40e3a8f1-acbf-443f-a73f-06a55b0a8929@github.com> References: <6HSZ5L4LjfEDdiQeKftFHlEMv3iRX68H2ZouhnQyV2c=.40e3a8f1-acbf-443f-a73f-06a55b0a8929@github.com> Message-ID: <5EgflENGi9Z1QuHn6OsaXWlJWYIG5vK8muSMqLNIPMA=.614710a3-1cde-4c88-bef1-971294c7f937@github.com> On Thu, 13 Mar 2025 09:44:04 GMT, Aleksey Shipilev wrote: >> We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. >> >> Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Only emit event when notification happened > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Rewrite test to RecordingStream > - Drop threshold to 0ms > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Disable by default > - Fix Marked as reviewed by mgronlun (Reviewer). Okay, Aleksey, it looks good. It was good to look at this stuff again. I realized some things we can do to improve all the monitor-related events, but I will attempt that after you put this in. Cheers. ------------- PR Review: https://git.openjdk.org/jdk/pull/23901#pullrequestreview-2692185809 PR Comment: https://git.openjdk.org/jdk/pull/23901#issuecomment-2731030090 From vpaprotski at openjdk.org Mon Mar 17 22:27:08 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 17 Mar 2025 22:27:08 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v5] In-Reply-To: References: Message-ID: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. > > Before (no AVX512) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s > > After (with AVX2) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s > > > Before (with AVX512): > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 102... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: improve test comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23719/files - new: https://git.openjdk.org/jdk/pull/23719/files/9d13cefa..72650cd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=03-04 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23719/head:pull/23719 PR: https://git.openjdk.org/jdk/pull/23719 From vpaprotski at openjdk.org Mon Mar 17 22:32:01 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 17 Mar 2025 22:32:01 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v6] In-Reply-To: References: Message-ID: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. > > Before (no AVX512) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s > > After (with AVX2) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s > > > Before (with AVX512): > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 102... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23719/files - new: https://git.openjdk.org/jdk/pull/23719/files/72650cd3..56fd168d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23719/head:pull/23719 PR: https://git.openjdk.org/jdk/pull/23719 From vpaprotski at openjdk.org Mon Mar 17 22:32:02 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 17 Mar 2025 22:32:02 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 23:07:45 GMT, Volodymyr Paprotski wrote: >> test/jdk/com/sun/security/util/math/intpoly/MontgomeryPolynomialFuzzTest.java line 30: >> >>> 28: import sun.security.util.math.intpoly.*; >>> 29: >>> 30: /* >> >> It is strange that there are two copies of the `@test` block. Can you please remove one of them, unless you are seeing a difference that I do not > > -XX:+/-UseIntPolyIntrinsics (test Java vs BigInt and intrinsic vs BigInt) > > Though I think I did this before I knew much about junit.. I think I can just have two @run commands (to make it clearer)? Will give that a try Turns out I do need both `@test`; (otherwise `make test TEST=...MontgomeryPolynomialFuzzTest.java` runs fewer tests). Seems other tests do the same. I did add a (better?) comment to the summary tag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1999760302 From fyang at openjdk.org Tue Mar 18 00:02:07 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 18 Mar 2025 00:02:07 GMT Subject: RFR: 8352159: RISC-V: add more zfa support [v3] In-Reply-To: References: Message-ID: <0oKP5eyxooH57KTNQBrBOyh8d1NwadLU8vfCQMghB_o=.45591ff5-e1b7-4a23-ba7f-f3862900d325@github.com> On Mon, 17 Mar 2025 18:58:18 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Previously in https://github.com/openjdk/jdk/pull/23844, `loadConH` is implemented only with `flh`, but `fli_h` should be more efficient if Zfa is supported; min/max for HF could use fmin/maxm.h instead too. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with three additional commits since the last revision: > > - fix fli_h > - fix rFlagsReg > - refactor min/max src/hotspot/cpu/riscv/riscv.ad line 8390: > 8388: %} > 8389: > 8390: instruct min_HF_reg(fRegF dst, fRegF src1, fRegF src2, rFlagsReg cr) Ah, the max/min_HF part of the work seems duplicate https://github.com/openjdk/jdk/pull/24047? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24081#discussion_r1999874502 From qpzhang at openjdk.org Tue Mar 18 02:51:07 2025 From: qpzhang at openjdk.org (Patrick Zhang) Date: Tue, 18 Mar 2025 02:51:07 GMT Subject: RFR: 8350663: AArch64: Enable UseSignumIntrinsic by default In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 11:33:05 GMT, Patrick Zhang wrote: > According to tests on Arm CPUs Neoverse-N1/N2/V1/V2 and Ampere-Altra/AmpereOne, `-XX:+UseSignumIntrinsic` can provide consistent positive performance boost on singnum microbenchmarks (1-4,5,7 in below list) and no obvious regression (ops/s change <0.1%) on other relevant tests (6,9-12). In addition, "_Apple M1 shows no regression with signum intrinsics_" (verified by @theRealAph). So, it can be the time to enable this UseSignumIntrinsic flag by default for aarch64-port. By the way, x86 and riscv ports have already configured it on by default. > > Tests: passed JTReg tier1 tests on Ampere-1A, no regression found, and particularly checked test results of two signum cases (13,14), both are in good state. > > > 1. org.openjdk.bench.java.lang.MathBench.signumDouble > 2. org.openjdk.bench.java.lang.MathBench.signumFloat > 3. org.openjdk.bench.java.lang.StrictMathBench.sigNumDouble > 4. org.openjdk.bench.java.lang.StrictMathBench.signumFloat > 5. org.openjdk.bench.vm.compiler.Signum._1_signumFloatTest > 6. org.openjdk.bench.vm.compiler.Signum._2_overheadFloat > 7. org.openjdk.bench.vm.compiler.Signum._3_signumDoubleTest > 8. org.openjdk.bench.vm.compiler.Signum._4_overheadDouble > 9. org.openjdk.bench.vm.compiler.Signum._5_copySignFloatTest > 10. org.openjdk.bench.vm.compiler.Signum._6_overheadCopySignFloat > 11. org.openjdk.bench.vm.compiler.Signum._7_copySignDoubleTest > 12. org.openjdk.bench.vm.compiler.Signum._8_overheadCopySignDouble > 13. JTReg: compiler/vectorization/TestSignumVector.java > 14. JTReg: compiler/intrinsics/math/TestSignumIntrinsic.java Hi @theRealAph Could you please help review this? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23893#issuecomment-2731461554 From dholmes at openjdk.org Tue Mar 18 03:39:12 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Mar 2025 03:39:12 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v5] In-Reply-To: <6HSZ5L4LjfEDdiQeKftFHlEMv3iRX68H2ZouhnQyV2c=.40e3a8f1-acbf-443f-a73f-06a55b0a8929@github.com> References: <6HSZ5L4LjfEDdiQeKftFHlEMv3iRX68H2ZouhnQyV2c=.40e3a8f1-acbf-443f-a73f-06a55b0a8929@github.com> Message-ID: On Thu, 13 Mar 2025 09:44:04 GMT, Aleksey Shipilev wrote: >> We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. >> >> Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Only emit event when notification happened > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Rewrite test to RecordingStream > - Drop threshold to 0ms > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Disable by default > - Fix LGTM2. One nit. Thanks src/hotspot/share/runtime/objectMonitor.cpp line 2005: > 2003: static void post_monitor_notify_event(EventJavaMonitorNotify* event, > 2004: ObjectMonitor* monitor, > 2005: int notified_count) { ```suggestion - indent is off static void post_monitor_notify_event(EventJavaMonitorNotify* event, ObjectMonitor* monitor, int notified_count) { ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23901#pullrequestreview-2692804394 PR Review Comment: https://git.openjdk.org/jdk/pull/23901#discussion_r2000128558 From dholmes at openjdk.org Tue Mar 18 04:01:21 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Mar 2025 04:01:21 GMT Subject: RFR: 8352098: -Xrunjdwp fails on static JDK In-Reply-To: <7AZV4Zx94USyWGyA2H3pu6-8Vo-ZM5or6AinLiC5MtM=.b770bcf8-00c0-4f73-a204-8009b6c70e26@github.com> References: <7AZV4Zx94USyWGyA2H3pu6-8Vo-ZM5or6AinLiC5MtM=.b770bcf8-00c0-4f73-a204-8009b6c70e26@github.com> Message-ID: <1t0ksTv8cjRsnQ756fJahmEOVBDWfenL0m05RE3iUN0=.3222c934-ba11-4931-8393-a42925d082d4@github.com> On Mon, 17 Mar 2025 19:37:48 GMT, Jiangli Zhou wrote: > Please review this fix that avoids `JvmtiAgent::convert_xrun_agent` from prematurely exiting VM if `lookup_On_Load_entry_point` cannot load the agent using `JVM_OnLoad` symbol. Thanks > > `lookup_On_Load_entry_point` first tries to load the builtin agent from the executable by checking the requested symbol (`JVM_OnLoad`). If no builtin agent is found, it then tries to load the agent shared library (e.g. `libjdwp.so`) by calling `load_library`. The issue is that `load_library` is called with `vm_exit_on_error` set to `true`, which causes the VM to exit immediately if the agent shared library is not loaded. Therefore, `JvmtiAgent::convert_xrun_agent` has no chance to try loading the agent using `Agent_OnLoad` symbol (https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/prims/jvmtiAgent.cpp#L352). This is a hidden issue on regular JDK, since the `load_library` can successfully find the agent shared library when `JvmtiAgent::convert_xrun_agent` first tries to load the agent using `JVM_OnLoad` symbol. The issue is noticed on static JDK as there is no `libjdwp.so` in static JDK. It can be reproduced with jtreg `runtime/6294277/Sou rceDebugExtension.java` test. > > As part of the fix, I cleaned up following in `invoke_JVM_OnLoad` and `invoke_Agent_OnLoad`. If there's an error, the VM should already have exited during `lookup__OnLoad_entry_point` in those cases. > > > if (on_load_entry == nullptr) { > vm_exit_during_initialization("Could not find ... function in -Xrun library", agent->name()); > } This seems reasonable to me. A couple of nits. Thanks src/hotspot/share/prims/jvmtiAgent.cpp line 360: > 358: // to try lookup_Agent_OnLoad_entry_point for Agent_OnLoad as well. > 359: OnLoadEntry_t on_load_entry = lookup_JVM_OnLoad_entry_point( > 360: this, /* vm exit on error */ false); Suggestion: OnLoadEntry_t on_load_entry = lookup_JVM_OnLoad_entry_point(this, /* vm exit on error */ false); src/hotspot/share/prims/jvmtiAgent.cpp line 365: > 363: if (on_load_entry == nullptr) { > 364: on_load_entry = lookup_Agent_OnLoad_entry_point( > 365: this, /* vm exit on error */ true); Suggestion: on_load_entry = lookup_Agent_OnLoad_entry_point(this, /* vm exit on error */ true); ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24086#pullrequestreview-2692846277 PR Review Comment: https://git.openjdk.org/jdk/pull/24086#discussion_r2000148624 PR Review Comment: https://git.openjdk.org/jdk/pull/24086#discussion_r2000148978 From david.holmes at oracle.com Tue Mar 18 04:14:35 2025 From: david.holmes at oracle.com (David Holmes) Date: Tue, 18 Mar 2025 14:14:35 +1000 Subject: [External] : Re: Verification in agent transformers In-Reply-To: References: <3AE3C86F-D811-4788-844A-CF3F13013444@iernst.net> <4d1ccef8-c837-482b-abb3-72f28593d08a@oracle.com> <37F06AA7-E883-4BCF-8E0E-6B2CF1A81FBD@iernst.net> <12AA447F-DFEC-4863-8E16-8D0265EC3CAE@iernst.net> Message-ID: On 11/03/2025 6:44 am, coleen.phillimore at oracle.com wrote: > On 3/10/25 4:14 PM, Alan Bateman wrote: >> On 10/03/2025 18:25, Ryan Ernst wrote: >>> Again, the VerifyError is correct, it?s what we expect (we created >>> bad bytecode in a transform), but it doesn?t always occur. >>> >> Classes loaded from modules mapped to the boot loader, or classes on >> the boot loader's class path, are not verified if modified at class >> load time. They are verified if redefined at runtime. Developers of >> agents are not infallible so there may be an argument to enable >> BytecodeVerificationLocal when an agent enables one of the >> can_generate_XXX_class_hook_events capabilities. > > Yes, I just checked the code and we don't verify classes loaded via CFLH > and we should fix that. Do we verify classes provided via --patch-modules? If you control the command-line you can disable all verification, so enabling it by default just adds overhead not security. David ----- > Coleen > >> >> -Alan > From fyang at openjdk.org Tue Mar 18 07:17:09 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 18 Mar 2025 07:17:09 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v4] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Mon, 17 Mar 2025 12:35:42 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> |RVWMO| Patched| >> | ---------- | ---------- | >> |fence iorw,iorw| fence iorw,ow| >> |sw t4,120(t2) | sw t4,120(t2) | >> |fence ow,ir | unnecessary_membar_volatile_rvwmo | >> | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | >> |fence iorw,ow | fence iorw,ow| >> |sw t5,124(t2) |sw t5,124(t2) | >> >> |TSO | Patched| >> | ---------- | ---------- | >> | lw a4,120(t2) | lw a6,120(t2) | >> | sw a0,124(t2) | sw t6,124(t2) | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> | sw t4,120(t2) | sw t4,120(t2) | >> | fence ow,ir | unnecessary_membar_volatile_tso | >> | sw t6,128(t2) | sw t5,128(t2) | >> | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> |... | ... | >> | sw a3,120(t2) | sw a0,120(t2) | >> | fence ow,ir | fence ow,ir | >> | lw a7,124(t2) | lw a5,124(t2) | >> >> For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. >> >> The patch do: >> - Separate ztso and rvwmo in ad by using UseZtso predicate. >> - Match all that requires the same membar. >> - Make fence/fencei protected as they shouldn't be using directly. >> - Increased cost of membars to VOLATILE_REF_COST. >> - Added a real_empty pipe. >> - Change to pipe_slow on TSO (as x86). >> >> Note that C2-rv64 is now superior to gcc/clang regrading fencing: >> https://godbolt.org/z/6E3YTP15j >> >> Testing jcstress, tier1 and manually reading the generated assembly. >> Doing additional testing, but RFR it now as it may need some consideration. >> >> /Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Looks good. My local tests are still good. src/hotspot/cpu/riscv/riscv.ad line 7979: > 7977: ins_cost(VOLATILE_REF_COST); > 7978: > 7979: format %{ "membar_volatile_rvtso\n\t" It's a bit confusing to me to see this `membar_volatile_rvtso` followed by a `fence w, r`. Seems better if we put a `#@` prefix making it a code comment for this `fence w, r`. I mean: `#@membar_volatile_rvtso` src/hotspot/cpu/riscv/riscv.ad line 8012: > 8010: ins_cost(VOLATILE_REF_COST); > 8011: > 8012: format %{ "membar_aqcuire_rvwmo\n\t" Similar here. src/hotspot/cpu/riscv/riscv.ad line 8028: > 8026: ins_cost(VOLATILE_REF_COST); > 8027: > 8028: format %{ "membar_release_rvwmo\n\t" Here. src/hotspot/cpu/riscv/riscv.ad line 8044: > 8042: ins_cost(VOLATILE_REF_COST); > 8043: > 8044: format %{ "membar_storestore_rvwmo\n\t" Here. src/hotspot/cpu/riscv/riscv.ad line 8058: > 8056: ins_cost(VOLATILE_REF_COST); > 8057: > 8058: format %{ "membar_volatile_rvwmo\n\t" And here. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24035#pullrequestreview-2693245825 PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2000350037 PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2000351489 PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2000351934 PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2000352205 PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2000352466 From shade at openjdk.org Tue Mar 18 07:20:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Mar 2025 07:20:27 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v6] In-Reply-To: References: Message-ID: > We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. > > Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Indenting - Merge branch 'master' into JDK-8351187-jfr-monitor-notify - Merge branch 'master' into JDK-8351187-jfr-monitor-notify - Only emit event when notification happened - Merge branch 'master' into JDK-8351187-jfr-monitor-notify - Rewrite test to RecordingStream - Drop threshold to 0ms - Merge branch 'master' into JDK-8351187-jfr-monitor-notify - Disable by default - Fix ------------- Changes: https://git.openjdk.org/jdk/pull/23901/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23901&range=05 Stats: 168 lines in 7 files changed: 162 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23901/head:pull/23901 PR: https://git.openjdk.org/jdk/pull/23901 From shade at openjdk.org Tue Mar 18 07:20:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Mar 2025 07:20:27 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v5] In-Reply-To: References: <6HSZ5L4LjfEDdiQeKftFHlEMv3iRX68H2ZouhnQyV2c=.40e3a8f1-acbf-443f-a73f-06a55b0a8929@github.com> Message-ID: On Tue, 18 Mar 2025 03:32:43 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - Merge branch 'master' into JDK-8351187-jfr-monitor-notify >> - Only emit event when notification happened >> - Merge branch 'master' into JDK-8351187-jfr-monitor-notify >> - Rewrite test to RecordingStream >> - Drop threshold to 0ms >> - Merge branch 'master' into JDK-8351187-jfr-monitor-notify >> - Disable by default >> - Fix > > src/hotspot/share/runtime/objectMonitor.cpp line 2005: > >> 2003: static void post_monitor_notify_event(EventJavaMonitorNotify* event, >> 2004: ObjectMonitor* monitor, >> 2005: int notified_count) { > > ```suggestion - indent is off > static void post_monitor_notify_event(EventJavaMonitorNotify* event, > ObjectMonitor* monitor, > int notified_count) { Right. Should be fixed in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23901#discussion_r2000353198 From sroy at openjdk.org Tue Mar 18 08:15:24 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Tue, 18 Mar 2025 08:15:24 GMT Subject: RFR: JDK-8331859 : [PPC64] Remove support for Power7 and older Message-ID: JBS Issue: [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859) Linux PPC64le requires Power8 since the beginning. AIX requires Power8 with the new OpenXL based build ([JDK-8307520](https://bugs.openjdk.org/browse/JDK-8307520)). The old build has been removed in JDK 23 ([JDK-8327701](https://bugs.openjdk.org/browse/JDK-8327701)). Linux PPC64 Big Endian is no longer officially supported (only kept alive for development, debugging and testing purposes). The following checks for old processors are no longer needed: 8: VM_Version::has_lqarx() 7: VM_Version::has_popcntw() 6: VM_Version::has_cmpb() 5: VM_Version::has_popcntb() These ones and some more checks for old instructions are no longer needed. All code which is no longer reachable when removing them should also get removed. Checks like "PowerArchitecturePPC64 >= 8" (or older) can be removed. Atomic::PlatformCmpxchg<1>::operator() can be simplified by using sub-word instructions (lharx, lbarx). Temp registers can be removed from cmpxchgb and cmpxchgh. Build flags "-mcpu=powerpc64 -mtune=power5" for Big Endian linux should get replaced by "-mcpu=power8 -mtune=power8" as already used for linux PPC64le. ------------- Commit messages: - multiple loads causing issues - Merge branch 'master' into power8 - Merge branch 'master' into power8 - spaces - aix changes - adapt changes - Merge remote-tracking branch 'origin' into power8 - Merge remote-tracking branch 'origin' into power8 - Merge branch 'master' into power8 - spaces and comments - ... and 7 more: https://git.openjdk.org/jdk/compare/ac76d8d6...a40ad3b3 Changes: https://git.openjdk.org/jdk/pull/20262/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20262&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331859 Stats: 404 lines in 9 files changed: 0 ins; 342 del; 62 mod Patch: https://git.openjdk.org/jdk/pull/20262.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20262/head:pull/20262 PR: https://git.openjdk.org/jdk/pull/20262 From rehn at openjdk.org Tue Mar 18 08:40:10 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 18 Mar 2025 08:40:10 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v4] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Tue, 18 Mar 2025 07:10:35 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > src/hotspot/cpu/riscv/riscv.ad line 7979: > >> 7977: ins_cost(VOLATILE_REF_COST); >> 7978: >> 7979: format %{ "membar_volatile_rvtso\n\t" > > It's a bit confusing to me to see this `membar_volatile_rvtso` followed by a `fence w, r`. > Seems better if we put a `#@` prefix making it a code comment for this `fence w, r` in the opto assembly. > I mean: `#@membar_volatile_rvtso` No other platform uses '@' in their ad files? E.g. format %{ "membar_acquire\n\t" "dmb ishld" %} It just looked weird that riscv have it's special way to format things. I can agree with '#', but why add a @ ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2000474785 From rehn at openjdk.org Tue Mar 18 09:10:12 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 18 Mar 2025 09:10:12 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV Message-ID: Hi please consider. Added case to turn off UseZvfh when no RVV. Which is the cause of the test issues, zvfh on but no rvv. Also made all case identical and added no warning when default. Move them to the common init, as the "UseExtension" is not C2 specific. Manual tested and some random compiler tests. Thanks, Robbin ------------- Commit messages: - Moved to common - Disable UseZvfh when no RVV Changes: https://git.openjdk.org/jdk/pull/24094/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24094&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352218 Stats: 51 lines in 1 file changed: 32 ins; 19 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24094.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24094/head:pull/24094 PR: https://git.openjdk.org/jdk/pull/24094 From sroy at openjdk.org Tue Mar 18 09:11:14 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Tue, 18 Mar 2025 09:11:14 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28] In-Reply-To: <1wMuCBIwYPaPM-bbsnFHi8hnkq-IL5Q_kCmaa1AdDpM=.1240fd83-db6d-489a-bbb3-48891daac064@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <7rVbCbWDqrib9Jyj7_hkD-r9rkaAOIXuwOGAqImrxoY=.a55e9572-b4e6-4cc2-aa0e-c23deb9961ce@github.com> <1wMuCBIwYPaPM-bbsnFHi8hnkq-IL5Q_kCmaa1AdDpM=.1240fd83-db6d-489a-bbb3-48891daac064@github.com> Message-ID: <0DSwCsm5yp2be9s-cgkZP4HCo4ppGD_SkDq4KyjfMEw=.0d74a4c8-e155-4186-884f-2575924f9d03@github.com> On Sat, 8 Mar 2025 18:14:48 GMT, Martin Doerr wrote: >> @TheRealMDoerr Yes. The tests do not pass with this. >> Trying to find a scope to reduce instructions. >> masm->vsldoi(vLowProduct, vLowProduct, vLowProduct, 8); // Swap >> masm->vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant >> masm->vsldoi(vCombinedResult, vLowProduct, vLowProduct, 8); // Swap >> >> >> can be brought down to 2 instructions. >> Still looking for scope to reduce. Let me know your inputs > > I still find it hard to read. Can you describe the algorithm in pseudo code or mathematical equations? We can try to map it to a shorter instruction sequence. > Btw. the comment looks wrong here: vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant @TheRealMDoerr https://www.researchgate.net/publication/285612706_Implementing_GCM_on_ARMv8 I think the same algorithm used for polynomial reduction -Section 4.3 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r2000541492 From adinn at openjdk.org Tue Mar 18 09:17:11 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 18 Mar 2025 09:17:11 GMT Subject: RFR: 8350663: AArch64: Enable UseSignumIntrinsic by default In-Reply-To: References: Message-ID: <-whVVFIhbWD9A3jHm7pWo1T-Fdhefae9nzfEDao_-yA=.db1f0a3a-13e5-44f4-a909-c3303adcde45@github.com> On Tue, 4 Mar 2025 11:33:05 GMT, Patrick Zhang wrote: > According to tests on Arm CPUs Neoverse-N1/N2/V1/V2 and Ampere-Altra/AmpereOne, `-XX:+UseSignumIntrinsic` can provide consistent positive performance boost on singnum microbenchmarks (1-4,5,7 in below list) and no obvious regression (ops/s change <0.1%) on other relevant tests (6,9-12). In addition, "_Apple M1 shows no regression with signum intrinsics_" (verified by @theRealAph). So, it can be the time to enable this UseSignumIntrinsic flag by default for aarch64-port. By the way, x86 and riscv ports have already configured it on by default. > > Tests: passed JTReg tier1 tests on Ampere-1A, no regression found, and particularly checked test results of two signum cases (13,14), both are in good state. > > > 1. org.openjdk.bench.java.lang.MathBench.signumDouble > 2. org.openjdk.bench.java.lang.MathBench.signumFloat > 3. org.openjdk.bench.java.lang.StrictMathBench.sigNumDouble > 4. org.openjdk.bench.java.lang.StrictMathBench.signumFloat > 5. org.openjdk.bench.vm.compiler.Signum._1_signumFloatTest > 6. org.openjdk.bench.vm.compiler.Signum._2_overheadFloat > 7. org.openjdk.bench.vm.compiler.Signum._3_signumDoubleTest > 8. org.openjdk.bench.vm.compiler.Signum._4_overheadDouble > 9. org.openjdk.bench.vm.compiler.Signum._5_copySignFloatTest > 10. org.openjdk.bench.vm.compiler.Signum._6_overheadCopySignFloat > 11. org.openjdk.bench.vm.compiler.Signum._7_copySignDoubleTest > 12. org.openjdk.bench.vm.compiler.Signum._8_overheadCopySignDouble > 13. JTReg: compiler/vectorization/TestSignumVector.java > 14. JTReg: compiler/intrinsics/math/TestSignumIntrinsic.java The change is good. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23893#pullrequestreview-2693607304 From mli at openjdk.org Tue Mar 18 09:29:10 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Mar 2025 09:29:10 GMT Subject: RFR: 8352159: RISC-V: add more zfa support [v3] In-Reply-To: <0oKP5eyxooH57KTNQBrBOyh8d1NwadLU8vfCQMghB_o=.45591ff5-e1b7-4a23-ba7f-f3862900d325@github.com> References: <0oKP5eyxooH57KTNQBrBOyh8d1NwadLU8vfCQMghB_o=.45591ff5-e1b7-4a23-ba7f-f3862900d325@github.com> Message-ID: On Mon, 17 Mar 2025 23:59:21 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with three additional commits since the last revision: >> >> - fix fli_h >> - fix rFlagsReg >> - refactor min/max > > src/hotspot/cpu/riscv/riscv.ad line 8390: > >> 8388: %} >> 8389: >> 8390: instruct min_HF_reg(fRegF dst, fRegF src1, fRegF src2, rFlagsReg cr) > > Ah, the max/min_HF part of the work seems duplicate https://github.com/openjdk/jdk/pull/24047? Ah, I'll remove the part. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24081#discussion_r2000575031 From dholmes at openjdk.org Tue Mar 18 10:07:09 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Mar 2025 10:07:09 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v6] In-Reply-To: References: Message-ID: <5ytORlANOCyyUM-fgrUTWZA6t1C97HIZhlb5iI4lzCM=.00fff6e5-6b2d-4fdd-9f2b-d54cd43c3100@github.com> On Tue, 18 Mar 2025 07:20:27 GMT, Aleksey Shipilev wrote: >> We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. >> >> Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Indenting > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Only emit event when notification happened > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Rewrite test to RecordingStream > - Drop threshold to 0ms > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Disable by default > - Fix Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23901#pullrequestreview-2693781270 From fyang at openjdk.org Tue Mar 18 10:14:07 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 18 Mar 2025 10:14:07 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v4] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Tue, 18 Mar 2025 08:37:24 GMT, Robbin Ehn wrote: > I can agree with '#', but why add a @ ? I can't recall the history. Maybe just to mark that this is the start of a specific match rule thus making the opto asm more readable. Opto output of some of the match rules are kind of complex like the compare ones (`CmpF3` etc.). But a single `#` also works for me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2000663209 From shade at openjdk.org Tue Mar 18 10:20:18 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Mar 2025 10:20:18 GMT Subject: RFR: 8351187: Add JFR monitor notification event [v6] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 07:20:27 GMT, Aleksey Shipilev wrote: >> We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. >> >> Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). >> >> This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Indenting > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Only emit event when notification happened > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Rewrite test to RecordingStream > - Drop threshold to 0ms > - Merge branch 'master' into JDK-8351187-jfr-monitor-notify > - Disable by default > - Fix Thanks all! I re-tested and it looks green. So I am integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23901#issuecomment-2732544197 From shade at openjdk.org Tue Mar 18 10:20:19 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Mar 2025 10:20:19 GMT Subject: Integrated: 8351187: Add JFR monitor notification event In-Reply-To: References: Message-ID: <1BznQn-y3nKbNUn3ZUlkwyA-MhZeRdThEz7gNiGqv8c=.e668a01c-ff15-4fcf-8ee5-c5ae2ea1fe1c@github.com> On Tue, 4 Mar 2025 16:05:36 GMT, Aleksey Shipilev wrote: > We have `JavaMonitorWait` event, but no symmetric `JavaMonitorNotify` event. Notifications are important/interesting to track as well, for example to correlate the delay between notification and eventual wake up. > > Providing this event would also replace one of of the RT counters that are going away in [JDK-8348829](https://bugs.openjdk.org/browse/JDK-8348829). > > This counter is disabled by default to keep any potential impact low. We can consider flipping it to enabled by default later. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `jdk_jfr` This pull request has now been integrated. Changeset: 20f1bca0 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/20f1bca0770b6b4d935b068e7f6a742cef4f5449 Stats: 168 lines in 7 files changed: 162 ins; 0 del; 6 mod 8351187: Add JFR monitor notification event Reviewed-by: dholmes, lmesnik, mgronlun ------------- PR: https://git.openjdk.org/jdk/pull/23901 From mli at openjdk.org Tue Mar 18 10:56:11 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Mar 2025 10:56:11 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 09:04:07 GMT, Robbin Ehn wrote: > Hi please consider. > > Added case to turn off UseZvfh when no RVV. > Which is the cause of the test issues, zvfh on but no rvv. > > Also made all case identical and added no warning when default. > Move them to the common init, as the "UseExtension" is not C2 specific. > > Manual tested and some random compiler tests. > > Thanks, Robbin Looks good. Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24094#pullrequestreview-2693979228 From mli at openjdk.org Tue Mar 18 11:25:19 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Mar 2025 11:25:19 GMT Subject: RFR: 8352159: RISC-V: add more zfa support [v4] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > > Previously in https://github.com/openjdk/jdk/pull/23844, `loadConH` is implemented only with `flh`, but `fli_h` should be more efficient if Zfa is supported; min/max for HF could use fmin/maxm.h instead too. > > Thanks! Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - merge master - fix fli_h - fix rFlagsReg - refactor min/max - min/max - initial commit ------------- Changes: https://git.openjdk.org/jdk/pull/24081/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24081&range=03 Stats: 123 lines in 4 files changed: 93 ins; 9 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/24081.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24081/head:pull/24081 PR: https://git.openjdk.org/jdk/pull/24081 From fyang at openjdk.org Tue Mar 18 11:37:08 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 18 Mar 2025 11:37:08 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 09:04:07 GMT, Robbin Ehn wrote: > Hi please consider. > > Added case to turn off UseZvfh when no RVV. > Which is the cause of the test issues, zvfh on but no rvv. > > Also made all case identical and added no warning when default. > Move them to the common init, as the "UseExtension" is not C2 specific. > > Manual tested and some random compiler tests. > > Thanks, Robbin src/hotspot/cpu/riscv/vm_version_riscv.cpp line 222: > 220: // UseZvbb (depends on RVV). > 221: if (UseZvbb && !UseRVV) { > 222: if (!FLAG_IS_DEFAULT(UseZvbb)) { So we have two code paths to enable this flag: 1. through the command line; 2. through hwprobe. I think the issue here is related to case 2. I wonder if that could be handled in that code path, that is when we call `UPDATE_DEFAULT` [1]. Then we could only consider case 1 here and simplify the code removing this `FLAG_IS_DEFAULT` check. It's a bit confusing to me as people might think that a true value of `UseZvbb` will mean that `FLAG_IS_DEFAULT` is false. What do you think? [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/vm_version_riscv.hpp#L171 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24094#discussion_r2000841570 From duke at openjdk.org Tue Mar 18 14:14:08 2025 From: duke at openjdk.org (duke) Date: Tue, 18 Mar 2025 14:14:08 GMT Subject: RFR: 8350663: AArch64: Enable UseSignumIntrinsic by default In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 11:33:05 GMT, Patrick Zhang wrote: > According to tests on Arm CPUs Neoverse-N1/N2/V1/V2 and Ampere-Altra/AmpereOne, `-XX:+UseSignumIntrinsic` can provide consistent positive performance boost on singnum microbenchmarks (1-4,5,7 in below list) and no obvious regression (ops/s change <0.1%) on other relevant tests (6,9-12). In addition, "_Apple M1 shows no regression with signum intrinsics_" (verified by @theRealAph). So, it can be the time to enable this UseSignumIntrinsic flag by default for aarch64-port. By the way, x86 and riscv ports have already configured it on by default. > > Tests: passed JTReg tier1 tests on Ampere-1A, no regression found, and particularly checked test results of two signum cases (13,14), both are in good state. > > > 1. org.openjdk.bench.java.lang.MathBench.signumDouble > 2. org.openjdk.bench.java.lang.MathBench.signumFloat > 3. org.openjdk.bench.java.lang.StrictMathBench.sigNumDouble > 4. org.openjdk.bench.java.lang.StrictMathBench.signumFloat > 5. org.openjdk.bench.vm.compiler.Signum._1_signumFloatTest > 6. org.openjdk.bench.vm.compiler.Signum._2_overheadFloat > 7. org.openjdk.bench.vm.compiler.Signum._3_signumDoubleTest > 8. org.openjdk.bench.vm.compiler.Signum._4_overheadDouble > 9. org.openjdk.bench.vm.compiler.Signum._5_copySignFloatTest > 10. org.openjdk.bench.vm.compiler.Signum._6_overheadCopySignFloat > 11. org.openjdk.bench.vm.compiler.Signum._7_copySignDoubleTest > 12. org.openjdk.bench.vm.compiler.Signum._8_overheadCopySignDouble > 13. JTReg: compiler/vectorization/TestSignumVector.java > 14. JTReg: compiler/intrinsics/math/TestSignumIntrinsic.java @cnqpzhang Your change (at version 2e012df341ca96dbf416fbeef887d6dfd2a9e052) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23893#issuecomment-2733406201 From qpzhang at openjdk.org Tue Mar 18 15:47:13 2025 From: qpzhang at openjdk.org (Patrick Zhang) Date: Tue, 18 Mar 2025 15:47:13 GMT Subject: Integrated: 8350663: AArch64: Enable UseSignumIntrinsic by default In-Reply-To: References: Message-ID: <1ITYpobfjQt1vIQYuefWlpnIqQdTL7bAY4nqO_H__NQ=.4fade816-d609-4491-8310-d730d415d4de@github.com> On Tue, 4 Mar 2025 11:33:05 GMT, Patrick Zhang wrote: > According to tests on Arm CPUs Neoverse-N1/N2/V1/V2 and Ampere-Altra/AmpereOne, `-XX:+UseSignumIntrinsic` can provide consistent positive performance boost on singnum microbenchmarks (1-4,5,7 in below list) and no obvious regression (ops/s change <0.1%) on other relevant tests (6,9-12). In addition, "_Apple M1 shows no regression with signum intrinsics_" (verified by @theRealAph). So, it can be the time to enable this UseSignumIntrinsic flag by default for aarch64-port. By the way, x86 and riscv ports have already configured it on by default. > > Tests: passed JTReg tier1 tests on Ampere-1A, no regression found, and particularly checked test results of two signum cases (13,14), both are in good state. > > > 1. org.openjdk.bench.java.lang.MathBench.signumDouble > 2. org.openjdk.bench.java.lang.MathBench.signumFloat > 3. org.openjdk.bench.java.lang.StrictMathBench.sigNumDouble > 4. org.openjdk.bench.java.lang.StrictMathBench.signumFloat > 5. org.openjdk.bench.vm.compiler.Signum._1_signumFloatTest > 6. org.openjdk.bench.vm.compiler.Signum._2_overheadFloat > 7. org.openjdk.bench.vm.compiler.Signum._3_signumDoubleTest > 8. org.openjdk.bench.vm.compiler.Signum._4_overheadDouble > 9. org.openjdk.bench.vm.compiler.Signum._5_copySignFloatTest > 10. org.openjdk.bench.vm.compiler.Signum._6_overheadCopySignFloat > 11. org.openjdk.bench.vm.compiler.Signum._7_copySignDoubleTest > 12. org.openjdk.bench.vm.compiler.Signum._8_overheadCopySignDouble > 13. JTReg: compiler/vectorization/TestSignumVector.java > 14. JTReg: compiler/intrinsics/math/TestSignumIntrinsic.java This pull request has now been integrated. Changeset: b025d8c2 Author: Patrick Zhang Committer: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/b025d8c2e062210b6148da43f11517666b0b4932 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod 8350663: AArch64: Enable UseSignumIntrinsic by default Reviewed-by: adinn ------------- PR: https://git.openjdk.org/jdk/pull/23893 From tschatzl at openjdk.org Tue Mar 18 16:24:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 18 Mar 2025 16:24:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v24] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: - * factor out card table and refinement table merging into a single method - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 - * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. - * more documentation on why we need to rendezvous the gc threads - Merge branch 'master' into 8342381-card-table-instead-of-dcq - * ayang review * re-add STS leaver for java thread handshake - * when aborting refinement during full collection, the global card table and the per-thread card table might not be in sync. Roll forward during abort of the refinement in these situations. * additional verification * added some missing ResourceMarks in asserts * added variant of ArrayJuggle2 that crashes fairly quickly without these changes - * ayang review * remove unnecessary STSleaver * some more documentation around to_collection_card card color - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang - ... and 22 more: https://git.openjdk.org/jdk/compare/b025d8c2...c833bc83 ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=23 Stats: 6788 lines in 104 files changed: 2382 ins; 3476 del; 930 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From jiangli at openjdk.org Tue Mar 18 16:54:51 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 18 Mar 2025 16:54:51 GMT Subject: RFR: 8352098: -Xrunjdwp fails on static JDK [v2] In-Reply-To: <7AZV4Zx94USyWGyA2H3pu6-8Vo-ZM5or6AinLiC5MtM=.b770bcf8-00c0-4f73-a204-8009b6c70e26@github.com> References: <7AZV4Zx94USyWGyA2H3pu6-8Vo-ZM5or6AinLiC5MtM=.b770bcf8-00c0-4f73-a204-8009b6c70e26@github.com> Message-ID: > Please review this fix that avoids `JvmtiAgent::convert_xrun_agent` from prematurely exiting VM if `lookup_On_Load_entry_point` cannot load the agent using `JVM_OnLoad` symbol. Thanks > > `lookup_On_Load_entry_point` first tries to load the builtin agent from the executable by checking the requested symbol (`JVM_OnLoad`). If no builtin agent is found, it then tries to load the agent shared library (e.g. `libjdwp.so`) by calling `load_library`. The issue is that `load_library` is called with `vm_exit_on_error` set to `true`, which causes the VM to exit immediately if the agent shared library is not loaded. Therefore, `JvmtiAgent::convert_xrun_agent` has no chance to try loading the agent using `Agent_OnLoad` symbol (https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/prims/jvmtiAgent.cpp#L352). This is a hidden issue on regular JDK, since the `load_library` can successfully find the agent shared library when `JvmtiAgent::convert_xrun_agent` first tries to load the agent using `JVM_OnLoad` symbol. The issue is noticed on static JDK as there is no `libjdwp.so` in static JDK. It can be reproduced with jtreg `runtime/6294277/Sou rceDebugExtension.java` test. > > As part of the fix, I cleaned up following in `invoke_JVM_OnLoad` and `invoke_Agent_OnLoad`. If there's an error, the VM should already have exited during `lookup__OnLoad_entry_point` in those cases. > > > if (on_load_entry == nullptr) { > vm_exit_during_initialization("Could not find ... function in -Xrun library", agent->name()); > } Jiangli Zhou has updated the pull request incrementally with two additional commits since the last revision: - Apply @dholmes-ora's edit suggestion. Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> - Apply @dholmes-ora's suggestion to use single line. Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24086/files - new: https://git.openjdk.org/jdk/pull/24086/files/b165b86f..745d4c0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24086&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24086&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24086.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24086/head:pull/24086 PR: https://git.openjdk.org/jdk/pull/24086 From jiangli at openjdk.org Tue Mar 18 16:59:08 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 18 Mar 2025 16:59:08 GMT Subject: RFR: 8352098: -Xrunjdwp fails on static JDK [v2] In-Reply-To: <1t0ksTv8cjRsnQ756fJahmEOVBDWfenL0m05RE3iUN0=.3222c934-ba11-4931-8393-a42925d082d4@github.com> References: <7AZV4Zx94USyWGyA2H3pu6-8Vo-ZM5or6AinLiC5MtM=.b770bcf8-00c0-4f73-a204-8009b6c70e26@github.com> <1t0ksTv8cjRsnQ756fJahmEOVBDWfenL0m05RE3iUN0=.3222c934-ba11-4931-8393-a42925d082d4@github.com> Message-ID: On Tue, 18 Mar 2025 03:58:51 GMT, David Holmes wrote: > This seems reasonable to me. A couple of nits. > > Thanks Thanks for the quick review, @dholmes-ora! Could anyone also help review as a second reviewer? Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/24086#issuecomment-2734020892 From cjplummer at openjdk.org Tue Mar 18 17:46:09 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 18 Mar 2025 17:46:09 GMT Subject: RFR: 8352098: -Xrunjdwp fails on static JDK [v2] In-Reply-To: References: <7AZV4Zx94USyWGyA2H3pu6-8Vo-ZM5or6AinLiC5MtM=.b770bcf8-00c0-4f73-a204-8009b6c70e26@github.com> Message-ID: On Tue, 18 Mar 2025 16:54:51 GMT, Jiangli Zhou wrote: >> Please review this fix that avoids `JvmtiAgent::convert_xrun_agent` from prematurely exiting VM if `lookup_On_Load_entry_point` cannot load the agent using `JVM_OnLoad` symbol. Thanks >> >> `lookup_On_Load_entry_point` first tries to load the builtin agent from the executable by checking the requested symbol (`JVM_OnLoad`). If no builtin agent is found, it then tries to load the agent shared library (e.g. `libjdwp.so`) by calling `load_library`. The issue is that `load_library` is called with `vm_exit_on_error` set to `true`, which causes the VM to exit immediately if the agent shared library is not loaded. Therefore, `JvmtiAgent::convert_xrun_agent` has no chance to try loading the agent using `Agent_OnLoad` symbol (https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/prims/jvmtiAgent.cpp#L352). This is a hidden issue on regular JDK, since the `load_library` can successfully find the agent shared library when `JvmtiAgent::convert_xrun_agent` first tries to load the agent using `JVM_OnLoad` symbol. The issue is noticed on static JDK as there is no `libjdwp.so` in static JDK. It can be reproduced with jtreg `runtime/6294277/So urceDebugExtension.java` test. >> >> As part of the fix, I cleaned up following in `invoke_JVM_OnLoad` and `invoke_Agent_OnLoad`. If there's an error, the VM should already have exited during `lookup__OnLoad_entry_point` in those cases. >> >> >> if (on_load_entry == nullptr) { >> vm_exit_during_initialization("Could not find ... function in -Xrun library", agent->name()); >> } > > Jiangli Zhou has updated the pull request incrementally with two additional commits since the last revision: > > - Apply @dholmes-ora's edit suggestion. > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Apply @dholmes-ora's suggestion to use single line. > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> I had a look yesterday and it looked good, and your latest changes look good also. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24086#pullrequestreview-2695605225 From jiangli at openjdk.org Tue Mar 18 17:59:09 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 18 Mar 2025 17:59:09 GMT Subject: RFR: 8352098: -Xrunjdwp fails on static JDK [v2] In-Reply-To: References: <7AZV4Zx94USyWGyA2H3pu6-8Vo-ZM5or6AinLiC5MtM=.b770bcf8-00c0-4f73-a204-8009b6c70e26@github.com> Message-ID: On Tue, 18 Mar 2025 17:43:07 GMT, Chris Plummer wrote: > I had a look yesterday and it looked good, and your latest changes look good also. Thanks for reviewing, @plummercj! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24086#issuecomment-2734232834 From jiangli at openjdk.org Tue Mar 18 19:07:12 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 18 Mar 2025 19:07:12 GMT Subject: Integrated: 8352098: -Xrunjdwp fails on static JDK In-Reply-To: <7AZV4Zx94USyWGyA2H3pu6-8Vo-ZM5or6AinLiC5MtM=.b770bcf8-00c0-4f73-a204-8009b6c70e26@github.com> References: <7AZV4Zx94USyWGyA2H3pu6-8Vo-ZM5or6AinLiC5MtM=.b770bcf8-00c0-4f73-a204-8009b6c70e26@github.com> Message-ID: On Mon, 17 Mar 2025 19:37:48 GMT, Jiangli Zhou wrote: > Please review this fix that avoids `JvmtiAgent::convert_xrun_agent` from prematurely exiting VM if `lookup_On_Load_entry_point` cannot load the agent using `JVM_OnLoad` symbol. Thanks > > `lookup_On_Load_entry_point` first tries to load the builtin agent from the executable by checking the requested symbol (`JVM_OnLoad`). If no builtin agent is found, it then tries to load the agent shared library (e.g. `libjdwp.so`) by calling `load_library`. The issue is that `load_library` is called with `vm_exit_on_error` set to `true`, which causes the VM to exit immediately if the agent shared library is not loaded. Therefore, `JvmtiAgent::convert_xrun_agent` has no chance to try loading the agent using `Agent_OnLoad` symbol (https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/prims/jvmtiAgent.cpp#L352). This is a hidden issue on regular JDK, since the `load_library` can successfully find the agent shared library when `JvmtiAgent::convert_xrun_agent` first tries to load the agent using `JVM_OnLoad` symbol. The issue is noticed on static JDK as there is no `libjdwp.so` in static JDK. It can be reproduced with jtreg `runtime/6294277/Sou rceDebugExtension.java` test. > > As part of the fix, I cleaned up following in `invoke_JVM_OnLoad` and `invoke_Agent_OnLoad`. If there's an error, the VM should already have exited during `lookup__OnLoad_entry_point` in those cases. > > > if (on_load_entry == nullptr) { > vm_exit_during_initialization("Could not find ... function in -Xrun library", agent->name()); > } This pull request has now been integrated. Changeset: 4a02de82 Author: Jiangli Zhou URL: https://git.openjdk.org/jdk/commit/4a02de82923545f18590f8509c55129a4aa20842 Stats: 33 lines in 1 file changed: 11 ins; 6 del; 16 mod 8352098: -Xrunjdwp fails on static JDK Reviewed-by: cjplummer, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/24086 From jrose at openjdk.org Wed Mar 19 01:28:16 2025 From: jrose at openjdk.org (John R Rose) Date: Wed, 19 Mar 2025 01:28:16 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:22:43 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: > > - Merge branch 'master' into implement-jep502 > - Clean up exception messages and fix comments > - Rename field > - Rename method and fix comment > - Rework reenterant logic > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f Hi again Per! Here are some brief notes from our face-to-face chat at JavaOne. Debuggers want/need a "hook" for tentative evaluation of stables. It is an error for a debugger to trigger stable value decisions. This applies mainly to stable lists because of `toString`. Just how "mutable" is a stable list? How "eager to decide"? Which methods (if any) are tentative: `toString` / `equals` / `hashCode` ? Currently in the PR, all are decisive. This might be a case of the ?wrong default?. Maybe refactor composites to expose systematically "tenative" access API: - Less universal: SV.list(My::compute) => List - More universal; SV.stableList(My::compute) => List> BTW, it?s easy to understand a stable-list as a list of stables. But let?s be sure to leave room for a more compact data structure. A compact stable-list is a list of stable views into a backing array. The backing array looks like `@Stable private T[] resolvedValues`. Not `private final List> stableValues`. For the record: I think this is sufficient for correctness: Use `getAcquire` (resp. `releaseSet`) for all stable reads (resp. writes. Do the `releaseSet` inside a mutex that serializes computation. Add a re-entrancy check in the mutex and throw on vicious cycles. I do NOT think `volatile` is necessary; it has too many fences. It is a safe default for a naked variable. But the stable variables are encapsulated, and do not need aggressive fences. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2735079141 From rehn at openjdk.org Wed Mar 19 06:50:25 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 19 Mar 2025 06:50:25 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v2] In-Reply-To: References: Message-ID: > Hi please consider. > > Added case to turn off UseZvfh when no RVV. > Which is the cause of the test issues, zvfh on but no rvv. > > Also made all case identical and added no warning when default. > Move them to the common init, as the "UseExtension" is not C2 specific. > > Manual tested and some random compiler tests. > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - hwprobe deps - Merge branch 'master' into maxvector_0 - Moved to common - Disable UseZvfh when no RVV ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24094/files - new: https://git.openjdk.org/jdk/pull/24094/files/5ccda800..2357c157 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24094&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24094&range=00-01 Stats: 768 lines in 43 files changed: 397 ins; 241 del; 130 mod Patch: https://git.openjdk.org/jdk/pull/24094.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24094/head:pull/24094 PR: https://git.openjdk.org/jdk/pull/24094 From rehn at openjdk.org Wed Mar 19 06:50:25 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 19 Mar 2025 06:50:25 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v2] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 11:34:02 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - hwprobe deps >> - Merge branch 'master' into maxvector_0 >> - Moved to common >> - Disable UseZvfh when no RVV > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 222: > >> 220: // UseZvbb (depends on RVV). >> 221: if (UseZvbb && !UseRVV) { >> 222: if (!FLAG_IS_DEFAULT(UseZvbb)) { > > So we have two code paths to enable this flag: 1. through the command line; 2. through hwprobe. I think the issue > here is related to case 2. I wonder if that could be handled in that code path, that is when we call `UPDATE_DEFAULT` [1]. > > Then we could only consider case 1 here and simplify the code removing this `FLAG_IS_DEFAULT` check. It's a bit confusing to me as people might think that a true value of `UseZvbb` will mean that `FLAG_IS_DEFAULT` is false. What do you think? > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/vm_version_riscv.hpp#L171 I'm not sure exactly what you had in mind, but I pushed what I think you were thinking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24094#discussion_r2002543443 From rehn at openjdk.org Wed Mar 19 06:52:52 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 19 Mar 2025 06:52:52 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v5] In-Reply-To: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: > Hi please consider. > > |RVWMO| Patched| > | ---------- | ---------- | > |fence iorw,iorw| fence iorw,ow| > |sw t4,120(t2) | sw t4,120(t2) | > |fence ow,ir | unnecessary_membar_volatile_rvwmo | > | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | > |fence iorw,ow | fence iorw,ow| > |sw t5,124(t2) |sw t5,124(t2) | > > |TSO | Patched| > | ---------- | ---------- | > | lw a4,120(t2) | lw a6,120(t2) | > | sw a0,124(t2) | sw t6,124(t2) | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > | sw t4,120(t2) | sw t4,120(t2) | > | fence ow,ir | unnecessary_membar_volatile_tso | > | sw t6,128(t2) | sw t5,128(t2) | > | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > |... | ... | > | sw a3,120(t2) | sw a0,120(t2) | > | fence ow,ir | fence ow,ir | > | lw a7,124(t2) | lw a5,124(t2) | > > For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. > > The patch do: > - Separate ztso and rvwmo in ad by using UseZtso predicate. > - Match all that requires the same membar. > - Make fence/fencei protected as they shouldn't be using directly. > - Increased cost of membars to VOLATILE_REF_COST. > - Added a real_empty pipe. > - Change to pipe_slow on TSO (as x86). > > Note that C2-rv64 is now superior to gcc/clang regrading fencing: > https://godbolt.org/z/6E3YTP15j > > Testing jcstress, tier1 and manually reading the generated assembly. > Doing additional testing, but RFR it now as it may need some consideration. > > /Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into tso-merge - Review comments - Merge branch 'master' into tso-merge - Review comments - Fixed ws - Revert NC - Fixed comment - UseNewCode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24035/files - new: https://git.openjdk.org/jdk/pull/24035/files/566c0a76..e0e4fff3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=03-04 Stats: 9574 lines in 175 files changed: 4650 ins; 3304 del; 1620 mod Patch: https://git.openjdk.org/jdk/pull/24035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24035/head:pull/24035 PR: https://git.openjdk.org/jdk/pull/24035 From rehn at openjdk.org Wed Mar 19 07:06:44 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 19 Mar 2025 07:06:44 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v6] In-Reply-To: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: > Hi please consider. > > |RVWMO| Patched| > | ---------- | ---------- | > |fence iorw,iorw| fence iorw,ow| > |sw t4,120(t2) | sw t4,120(t2) | > |fence ow,ir | unnecessary_membar_volatile_rvwmo | > | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | > |fence iorw,ow | fence iorw,ow| > |sw t5,124(t2) |sw t5,124(t2) | > > |TSO | Patched| > | ---------- | ---------- | > | lw a4,120(t2) | lw a6,120(t2) | > | sw a0,124(t2) | sw t6,124(t2) | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > | sw t4,120(t2) | sw t4,120(t2) | > | fence ow,ir | unnecessary_membar_volatile_tso | > | sw t6,128(t2) | sw t5,128(t2) | > | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > |... | ... | > | sw a3,120(t2) | sw a0,120(t2) | > | fence ow,ir | fence ow,ir | > | lw a7,124(t2) | lw a5,124(t2) | > > For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. > > The patch do: > - Separate ztso and rvwmo in ad by using UseZtso predicate. > - Match all that requires the same membar. > - Make fence/fencei protected as they shouldn't be using directly. > - Increased cost of membars to VOLATILE_REF_COST. > - Added a real_empty pipe. > - Change to pipe_slow on TSO (as x86). > > Note that C2-rv64 is now superior to gcc/clang regrading fencing: > https://godbolt.org/z/6E3YTP15j > > Testing jcstress, tier1 and manually reading the generated assembly. > Doing additional testing, but RFR it now as it may need some consideration. > > /Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: format comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24035/files - new: https://git.openjdk.org/jdk/pull/24035/files/e0e4fff3..cb184209 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=04-05 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24035/head:pull/24035 PR: https://git.openjdk.org/jdk/pull/24035 From rehn at openjdk.org Wed Mar 19 07:06:44 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 19 Mar 2025 07:06:44 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v4] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Tue, 18 Mar 2025 10:10:38 GMT, Fei Yang wrote: >> No other platform uses '@' in their ad files? >> E.g. >> >> format %{ "membar_acquire\n\t" >> "dmb ishld" %} >> >> >> It just looked weird that riscv have it's special way to format things. >> >> I can agree with '#', but why add a @ ? > >> I can agree with '#', but why add a @ ? > > I can't recall the history. Maybe just to mark that this is the start of a specific match rule thus making the opto asm more readable (and easier for the reader to map which instructions to which match rule). Opto output of some of the match rules are kind of complex like the compare ones (`CmpF3` etc.). But a single `#` also works for me. Ok, reverted to original style for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2002561056 From fyang at openjdk.org Wed Mar 19 07:13:07 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 19 Mar 2025 07:13:07 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v6] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: <5tjPbHyqkLc_l1qe_AqQTR1NEREAsvGmGjEB_laDL_8=.8fc22027-3aa8-4df7-8221-120026426c1a@github.com> On Wed, 19 Mar 2025 07:06:44 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> |RVWMO| Patched| >> | ---------- | ---------- | >> |fence iorw,iorw| fence iorw,ow| >> |sw t4,120(t2) | sw t4,120(t2) | >> |fence ow,ir | unnecessary_membar_volatile_rvwmo | >> | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | >> |fence iorw,ow | fence iorw,ow| >> |sw t5,124(t2) |sw t5,124(t2) | >> >> |TSO | Patched| >> | ---------- | ---------- | >> | lw a4,120(t2) | lw a6,120(t2) | >> | sw a0,124(t2) | sw t6,124(t2) | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> | sw t4,120(t2) | sw t4,120(t2) | >> | fence ow,ir | unnecessary_membar_volatile_tso | >> | sw t6,128(t2) | sw t5,128(t2) | >> | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> |... | ... | >> | sw a3,120(t2) | sw a0,120(t2) | >> | fence ow,ir | fence ow,ir | >> | lw a7,124(t2) | lw a5,124(t2) | >> >> For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. >> >> The patch do: >> - Separate ztso and rvwmo in ad by using UseZtso predicate. >> - Match all that requires the same membar. >> - Make fence/fencei protected as they shouldn't be using directly. >> - Increased cost of membars to VOLATILE_REF_COST. >> - Added a real_empty pipe. >> - Change to pipe_slow on TSO (as x86). >> >> Note that C2-rv64 is now superior to gcc/clang regrading fencing: >> https://godbolt.org/z/6E3YTP15j >> >> Testing jcstress, tier1 and manually reading the generated assembly. >> Doing additional testing, but RFR it now as it may need some consideration. >> >> /Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > format comment Looks great to me now! Thanks for the updates! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24035#pullrequestreview-2697179062 From fyang at openjdk.org Wed Mar 19 07:27:09 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 19 Mar 2025 07:27:09 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v2] In-Reply-To: References: Message-ID: <-yKfvCL7LPBtpTZGPatxRHwOXvWV-KobswjuvxzBpMU=.45845ddd-dd97-4d26-b722-d392c468e56f@github.com> On Wed, 19 Mar 2025 06:50:25 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> Added case to turn off UseZvfh when no RVV. >> Which is the cause of the test issues, zvfh on but no rvv. >> >> Also made all case identical and added no warning when default. >> Move them to the common init, as the "UseExtension" is not C2 specific. >> >> Manual tested and some random compiler tests. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - hwprobe deps > - Merge branch 'master' into maxvector_0 > - Moved to common > - Disable UseZvfh when no RVV Looks good. I like the latest version which makes dependencies between extensions explict in the hwprobe code path. Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24094#pullrequestreview-2697232079 From fyang at openjdk.org Wed Mar 19 07:47:14 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 19 Mar 2025 07:47:14 GMT Subject: RFR: 8352159: RISC-V: add more zfa support [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 11:25:19 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Previously in https://github.com/openjdk/jdk/pull/23844, `loadConH` is implemented only with `flh`, but `fli_h` should be more efficient if Zfa is supported; min/max for HF could use fmin/maxm.h instead too. >> >> Thanks! > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - merge master > - fix fli_h > - fix rFlagsReg > - refactor min/max > - min/max > - initial commit Looks good to me. I have a suggestion about the value searching. We can consider that in the future. src/hotspot/cpu/riscv/assembler_riscv.hpp line 446: > 444: case 0x7c00 : return 29; > 445: // case 0x7c00 : return 30; // redundant with 29 > 446: case 0x7e00 : return 31; This switch-case seems nontrivial (Same for for all S & D variants). One possible enhancement to speed up might be putting the values in a table and doing a binary search. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24081#pullrequestreview-2697267136 PR Review Comment: https://git.openjdk.org/jdk/pull/24081#discussion_r2002647512 From sroy at openjdk.org Wed Mar 19 08:26:55 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 19 Mar 2025 08:26:55 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v29] In-Reply-To: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: > JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) > > Currently acceleration code for GHASH is missing for PPC64. > > The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. Suchismith Roy has updated the pull request incrementally with three additional commits since the last revision: - comments - comments - comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20235/files - new: https://git.openjdk.org/jdk/pull/20235/files/3bca30f6..a41fdc27 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=27-28 Stats: 11 lines in 1 file changed: 0 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235 PR: https://git.openjdk.org/jdk/pull/20235 From sroy at openjdk.org Wed Mar 19 09:02:14 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 19 Mar 2025 09:02:14 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28] In-Reply-To: <0DSwCsm5yp2be9s-cgkZP4HCo4ppGD_SkDq4KyjfMEw=.0d74a4c8-e155-4186-884f-2575924f9d03@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <7rVbCbWDqrib9Jyj7_hkD-r9rkaAOIXuwOGAqImrxoY=.a55e9572-b4e6-4cc2-aa0e-c23deb9961ce@github.com> <1wMuCBIwYPaPM-bbsnFHi8hnkq-IL5Q_kCmaa1AdDpM=.1240fd83-db6d-489a-bbb3-48891daac064@github.com> <0DSwCsm5yp2be9s-cgkZP4HCo4ppGD_SkDq4KyjfMEw=.0d74a4c8-e155-4186-884f-2575924f9d03@github.com> Message-ID: On Tue, 18 Mar 2025 09:08:34 GMT, Suchismith Roy wrote: >> I still find it hard to read. Can you describe the algorithm in pseudo code or mathematical equations? We can try to map it to a shorter instruction sequence. >> Btw. the comment looks wrong here: vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant > > @TheRealMDoerr > https://www.researchgate.net/publication/285612706_Implementing_GCM_on_ARMv8 > > I think the same algorithm used for polynomial reduction -Section 4.3 Hi @theRealAph Do you see a scope to reduce these swaps in the algorithm , for the above mentioned instructions. I feel there is a similar set of instructions used to perform reduction in https://www.researchgate.net/publication/285612706_Implementing_GCM_on_ARMv8 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r2002800508 From lucy at openjdk.org Wed Mar 19 09:17:15 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 19 Mar 2025 09:17:15 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v10] In-Reply-To: References: Message-ID: <9Ay-X2pALP4vMgZfh_Uk5DmLir-FUMxI9k3D1N-EiyM=.0b6e12e8-0338-4d3a-a108-8ee4b66db55a@github.com> On Tue, 25 Feb 2025 15:39:21 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/s390/c1_Runtime1_s390.cpp > > Co-authored-by: Andrew Haley There is one minor add'l improvement, if you like. Does not block integration. Looks good otherwise. src/hotspot/cpu/s390/c1_Runtime1_s390.cpp line 603: > 601: Register temp0 = Z_ARG4, temp1 = Z_ARG5, temp2 = Z_R10, temp3 = Z_R11; > 602: > 603: __ z_lg(klass, Address(Z_ARG1, java_lang_Class::klass_offset())); You could combine this with the LTGR below into a LTG. The clear_reg() call will preserve CC. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23535#pullrequestreview-2697552814 PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r2002823312 From rehn at openjdk.org Wed Mar 19 09:29:08 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 19 Mar 2025 09:29:08 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v2] In-Reply-To: <-yKfvCL7LPBtpTZGPatxRHwOXvWV-KobswjuvxzBpMU=.45845ddd-dd97-4d26-b722-d392c468e56f@github.com> References: <-yKfvCL7LPBtpTZGPatxRHwOXvWV-KobswjuvxzBpMU=.45845ddd-dd97-4d26-b722-d392c468e56f@github.com> Message-ID: On Wed, 19 Mar 2025 07:24:37 GMT, Fei Yang wrote: > Looks good. I like the latest version which makes dependencies between extensions explict in the hwprobe code path. Thanks! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24094#issuecomment-2735880844 From mli at openjdk.org Wed Mar 19 09:30:08 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 19 Mar 2025 09:30:08 GMT Subject: RFR: 8352159: RISC-V: add more zfa support [v4] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 07:41:02 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - merge master >> - fix fli_h >> - fix rFlagsReg >> - refactor min/max >> - min/max >> - initial commit > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 446: > >> 444: case 0x7c00 : return 29; >> 445: // case 0x7c00 : return 30; // redundant with 29 >> 446: case 0x7e00 : return 31; > > This switch-case seems nontrivial (Same for for all S & D variants). One possible enhancement to speed up might be putting the values in a table and doing a binary search. In this case, I think the compiler (gcc or clang) will be able to automatically generate a binary research rather than a linear search, so we don't have to write the code manually. And keep the code as it is is more readable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24081#discussion_r2002867954 From rehn at openjdk.org Wed Mar 19 09:37:14 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 19 Mar 2025 09:37:14 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v6] In-Reply-To: <5tjPbHyqkLc_l1qe_AqQTR1NEREAsvGmGjEB_laDL_8=.8fc22027-3aa8-4df7-8221-120026426c1a@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> <5tjPbHyqkLc_l1qe_AqQTR1NEREAsvGmGjEB_laDL_8=.8fc22027-3aa8-4df7-8221-120026426c1a@github.com> Message-ID: <3idnK7O7RUNoI7Gf-CxUKqv2vyqaOEfaomOUN9YvsfA=.f8f6dbb7-c82f-4be4-b8dc-d7cebcaddfb0@github.com> On Wed, 19 Mar 2025 07:10:23 GMT, Fei Yang wrote: > Looks great to me now! Thanks for the updates! Yet another, thank you! :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24035#issuecomment-2735914367 From amitkumar at openjdk.org Wed Mar 19 10:17:56 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 19 Mar 2025 10:17:56 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v11] In-Reply-To: References: Message-ID: > s390x implementation for Class.isInstance intrinsic. > > Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. > > Benchmark results will be updated soon. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: suggestion from Lutz ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23535/files - new: https://git.openjdk.org/jdk/pull/23535/files/e7269045..a7d3de08 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=09-10 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23535/head:pull/23535 PR: https://git.openjdk.org/jdk/pull/23535 From amitkumar at openjdk.org Wed Mar 19 10:17:57 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 19 Mar 2025 10:17:57 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v10] In-Reply-To: <9Ay-X2pALP4vMgZfh_Uk5DmLir-FUMxI9k3D1N-EiyM=.0b6e12e8-0338-4d3a-a108-8ee4b66db55a@github.com> References: <9Ay-X2pALP4vMgZfh_Uk5DmLir-FUMxI9k3D1N-EiyM=.0b6e12e8-0338-4d3a-a108-8ee4b66db55a@github.com> Message-ID: On Wed, 19 Mar 2025 09:07:11 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/cpu/s390/c1_Runtime1_s390.cpp >> >> Co-authored-by: Andrew Haley > > src/hotspot/cpu/s390/c1_Runtime1_s390.cpp line 603: > >> 601: Register temp0 = Z_ARG4, temp1 = Z_ARG5, temp2 = Z_R10, temp3 = Z_R11; >> 602: >> 603: __ z_lg(klass, Address(Z_ARG1, java_lang_Class::klass_offset())); > > You could combine this with the LTGR below into a LTG. The clear_reg() call will preserve CC. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r2002955505 From fyang at openjdk.org Wed Mar 19 10:27:08 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 19 Mar 2025 10:27:08 GMT Subject: RFR: 8352159: RISC-V: add more zfa support [v4] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 09:27:44 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/assembler_riscv.hpp line 446: >> >>> 444: case 0x7c00 : return 29; >>> 445: // case 0x7c00 : return 30; // redundant with 29 >>> 446: case 0x7e00 : return 31; >> >> This switch-case seems nontrivial (Same for for all S & D variants). One possible enhancement to speed up might be putting the values in a table and doing a binary search. > > In this case, I think the compiler (gcc or clang) will be able to automatically generate a binary research rather than a linear search, so we don't have to write the code manually. And keep the code as it is is more readable. Interesting! I tried several recent GCC versions and I see it is doing a binary search. Then I think we are OK with the current shape. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24081#discussion_r2002980368 From mli at openjdk.org Wed Mar 19 10:30:09 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 19 Mar 2025 10:30:09 GMT Subject: RFR: 8352159: RISC-V: add more zfa support [v4] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 07:43:27 GMT, Fei Yang wrote: > Looks good to me. I have a suggestion about the value searching. We can consider that in the future. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24081#issuecomment-2736073657 From dholmes at openjdk.org Wed Mar 19 12:19:20 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Mar 2025 12:19:20 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v5] In-Reply-To: References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: On Wed, 12 Mar 2025 16:12:12 GMT, Aleksey Shipilev wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Drop peak count completely > > Thank you for reviews, appreciated! I'll integrate shortly. @shipilev we missed the fact the obj may be null when deflating. A bug is being filed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23900#issuecomment-2736428115 From tschatzl at openjdk.org Wed Mar 19 13:17:19 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Mar 2025 13:17:19 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v25] In-Reply-To: References: Message-ID: <5Q9-MERAD4KIP-fzgw7JVAtC9u4L1fEFGcNkdHBvkg4=.1917bd58-a5f8-4c5c-b1f9-27b7457c6262@github.com> > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * fix IR code generation tests that change due to barrier cost changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/c833bc83..f419556e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=23-24 Stats: 5 lines in 2 files changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Wed Mar 19 13:27:17 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Mar 2025 13:27:17 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v25] In-Reply-To: <5Q9-MERAD4KIP-fzgw7JVAtC9u4L1fEFGcNkdHBvkg4=.1917bd58-a5f8-4c5c-b1f9-27b7457c6262@github.com> References: <5Q9-MERAD4KIP-fzgw7JVAtC9u4L1fEFGcNkdHBvkg4=.1917bd58-a5f8-4c5c-b1f9-27b7457c6262@github.com> Message-ID: On Wed, 19 Mar 2025 13:17:19 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * fix IR code generation tests that change due to barrier cost changes Commit https://github.com/openjdk/jdk/pull/23739/commits/f419556e9177ecf9fbf22e606dd6c1b850f4330f fixes the failing compiler tests that check whether the compiler emits the correct object graph. Occurs after merging with mainline that significantly reduces total barrier cost calculation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2736639357 From shade at openjdk.org Wed Mar 19 13:50:46 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Mar 2025 13:50:46 GMT Subject: RFR: 8352415: x86: Tighten up template interpreter method entry code Message-ID: Interpreter performance is the still important for faster startup, since it would carry application until compilers kick in. After looking at Leyden scenarios in Xint mode, I believe incremental improvements are possible in template interpreter to make it faster. One of those improvements is tightening up method entry code. Profiling shows the hottest path in the whole ordeal for non-native methods is resolving the Java mirror to store the GC root for currently executing Method*. It involves 4-5 chained memory accesses, which incurs significant latency. We can massage the code to reuse some memory accesses and also spread them out to allow more latency-hiding hardware mechanisms to kick in. Additional testing: - [x] Ad-hoc `-Xint` benchmarks - [ ] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/24114/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24114&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352415 Stats: 27 lines in 1 file changed: 11 ins; 1 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24114.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24114/head:pull/24114 PR: https://git.openjdk.org/jdk/pull/24114 From shade at openjdk.org Wed Mar 19 13:50:46 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Mar 2025 13:50:46 GMT Subject: RFR: 8352415: x86: Tighten up template interpreter method entry code In-Reply-To: References: Message-ID: <8fSNdYhT-zv27XrKb-oYK4jQulPO4phB3MIw0kRvV3E=.e3fd018e-a2cb-4db7-b9ee-b17cea5f17d3@github.com> On Wed, 19 Mar 2025 13:44:40 GMT, Aleksey Shipilev wrote: > Interpreter performance is the still important for faster startup, since it would carry application until compilers kick in. After looking at Leyden scenarios in Xint mode, I believe incremental improvements are possible in template interpreter to make it faster. > > One of those improvements is tightening up method entry code. Profiling shows the hottest path in the whole ordeal for non-native methods is resolving the Java mirror to store the GC root for currently executing Method*. It involves 4-5 chained memory accesses, which incurs significant latency. > > We can massage the code to reuse some memory accesses and also spread them out to allow more latency-hiding hardware mechanisms to kick in. > > Additional testing: > - [x] Ad-hoc `-Xint` benchmarks > - [ ] Linux x86_64 server fastdebug, `all` Motivational improvements on 5950X, about 1.8% faster interpreted code. Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms1g -Xmx1g -XX:+UseSerialGC \ -Xint -cp JavacBenchApp.jar JavacBenchApp 1 # Before Time (mean ? ?): 1.533 s ? 0.013 s [User: 1.479 s, System: 0.051 s] Range (min ? max): 1.517 s ? 1.551 s 10 runs # After Time (mean ? ?): 1.506 s ? 0.012 s [User: 1.451 s, System: 0.051 s] Range (min ? max): 1.493 s ? 1.528 s 10 runs ------------- PR Comment: https://git.openjdk.org/jdk/pull/24114#issuecomment-2736705387 From adinn at openjdk.org Wed Mar 19 15:10:08 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 19 Mar 2025 15:10:08 GMT Subject: RFR: 8352415: x86: Tighten up template interpreter method entry code In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 13:44:40 GMT, Aleksey Shipilev wrote: > Interpreter performance is the still important for faster startup, since it would carry application until compilers kick in. After looking at Leyden scenarios in Xint mode, I believe incremental improvements are possible in template interpreter to make it faster. > > One of those improvements is tightening up method entry code. Profiling shows the hottest path in the whole ordeal for non-native methods is resolving the Java mirror to store the GC root for currently executing Method*. It involves 4-5 chained memory accesses, which incurs significant latency. > > We can massage the code to reuse some memory accesses and also spread them out to allow more latency-hiding hardware mechanisms to kick in. > > Additional testing: > - [x] Ad-hoc `-Xint` benchmarks > - [ ] Linux x86_64 server fastdebug, `all` src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp line 656: > 654: // Resolve ConstMethod* -> ConstantPool*. > 655: // Get codebase, while we still have ConstMethod*. > 656: // Save ConstantPool* in rax for later use. Oh nice! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24114#discussion_r2003577225 From adinn at openjdk.org Wed Mar 19 15:18:08 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 19 Mar 2025 15:18:08 GMT Subject: RFR: 8352415: x86: Tighten up template interpreter method entry code In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 13:44:40 GMT, Aleksey Shipilev wrote: > Interpreter performance is the still important for faster startup, since it would carry application until compilers kick in. After looking at Leyden scenarios in Xint mode, I believe incremental improvements are possible in template interpreter to make it faster. > > One of those improvements is tightening up method entry code. Profiling shows the hottest path in the whole ordeal for non-native methods is resolving the Java mirror to store the GC root for currently executing Method*. It involves 4-5 chained memory accesses, which incurs significant latency. > > We can massage the code to reuse some memory accesses and also spread them out to allow more latency-hiding hardware mechanisms to kick in. > > Additional testing: > - [x] Ad-hoc `-Xint` benchmarks > - [ ] Linux x86_64 server fastdebug, `all` Looks good to me. Have you looked at aarch64 to see if this also a bottleneck there? It could use a similar trick to avoid calling load_mirror (which repeats the first two of three loads that intitialize rcpool). ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24114#pullrequestreview-2698928874 From shade at openjdk.org Wed Mar 19 15:28:20 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Mar 2025 15:28:20 GMT Subject: RFR: 8351142: Add JFR monitor deflation and statistics events [v5] In-Reply-To: References: <6RGbbx9nSurXCZzjs88q4GO2TmB91qbv3R6xQyNsSuw=.f311736f-0401-4e9b-8ff3-f6d17bf0ca1b@github.com> Message-ID: <7f-_mN0_z__Yt2HCk9N8aYIjsaLv5Z_5mCwJcGMRogM=.76e46a20-54fb-457e-b5c6-95123841544d@github.com> On Wed, 12 Mar 2025 16:12:12 GMT, Aleksey Shipilev wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Drop peak count completely > > Thank you for reviews, appreciated! I'll integrate shortly. > @shipilev we missed the fact the obj may be null when deflating. A bug is being filed. On it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23900#issuecomment-2737061907 From shade at openjdk.org Wed Mar 19 15:30:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Mar 2025 15:30:10 GMT Subject: RFR: 8352415: x86: Tighten up template interpreter method entry code In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 15:15:46 GMT, Andrew Dinn wrote: > Have you looked at aarch64 to see if this also a bottleneck there? It could use a similar trick to avoid calling load_mirror (which repeats the first two of three loads that intitialize rcpool). Only briefly. AArch64 is clunkier, since it does things in pairs, but I think improvements there are possible as well. As usual, we do arch-specific optimizations in separate PRs, with one architecture being a pilot :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24114#issuecomment-2737069271 From mli at openjdk.org Wed Mar 19 15:48:49 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 19 Mar 2025 15:48:49 GMT Subject: RFR: 8352423: RISC-V: simplify DivI/L ModI/L Message-ID: <3oq1Fgqt-St5FpdVKDixDPfiI4dHLdMosA9dyhupwpA=.c782c45f-dab1-4bb6-a3d9-8d7ae6c56a8c@github.com> Hi, Can you help to review this patch? Currently, implementation of DivI/L and ModI/L are overcomplicated, could and should be simplified. And, also enable some DivI/L and ModI/L related tests. Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24119/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24119&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352423 Stats: 172 lines in 6 files changed: 20 ins; 133 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/24119.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24119/head:pull/24119 PR: https://git.openjdk.org/jdk/pull/24119 From mli at openjdk.org Wed Mar 19 16:13:32 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 19 Mar 2025 16:13:32 GMT Subject: RFR: 8352423: RISC-V: simplify DivI/L ModI/L [v2] In-Reply-To: <3oq1Fgqt-St5FpdVKDixDPfiI4dHLdMosA9dyhupwpA=.c782c45f-dab1-4bb6-a3d9-8d7ae6c56a8c@github.com> References: <3oq1Fgqt-St5FpdVKDixDPfiI4dHLdMosA9dyhupwpA=.c782c45f-dab1-4bb6-a3d9-8d7ae6c56a8c@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Currently, implementation of DivI/L and ModI/L are overcomplicated, could and should be simplified. > And, also enable some DivI/L and ModI/L related tests. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24119/files - new: https://git.openjdk.org/jdk/pull/24119/files/85b880be..64d1ad47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24119&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24119&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24119.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24119/head:pull/24119 PR: https://git.openjdk.org/jdk/pull/24119 From mcimadamore at openjdk.org Wed Mar 19 16:20:21 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 19 Mar 2025 16:20:21 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: <2AEBD2FF-0816-418A-B8A9-C936D942F4D3@oracle.com> References: <2AEBD2FF-0816-418A-B8A9-C936D942F4D3@oracle.com> Message-ID: On Sun, 16 Mar 2025 03:30:46 GMT, John Rose wrote: > This might seem to contradict my previous assertion > that StableValue, being mutex based, must not > use lock-free idioms. That comment applies > specifically to the update operation that takes > a lambda. Other operations, such as reading > a SV, or hopefully poking a value at a SV can be, > and should be, composed of lock-free operations. > Why take a lock when it?s just a one-word read > or write? The important thing is that the lambda-accepting operation uses a mutex, so that the lambda (which might be a potentially expensive operation) is only invoked once. While other operations might in principle be lock-free, I'm not completely against pushing a very simple first iteration that is clearly and obviously correct. Once all the tests etc. are in place, it should be possible to improve the implementation further and remove locking where we can? (I think the interplay between the lambda-accepting set and a regular set will make it a bit tricky to go partially lock free) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2737258754 From shade at openjdk.org Wed Mar 19 17:52:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Mar 2025 17:52:54 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead Message-ID: Little regression crept in with [JDK-8351142](https://bugs.openjdk.org/browse/JDK-8351142): on the deflation path, object associated with monitor can be already dead. The event wants to record it, touches the dead object and crashes. The fix is simple: since we cannot infer any useful information from the event, we just skip the event emit. A new stress test fails within seconds without a fix. It also covers other monitor events, so we have extra coverage there as well. Additional testing: - [x] Linux x86_64 server fastdebug, new stress test now passes - [x] Linux x86_64 server fastdebug, `jdk_jfr` ------------- Commit messages: - Stress test - Blind fix Changes: https://git.openjdk.org/jdk/pull/24121/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24121&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352414 Stats: 158 lines in 2 files changed: 157 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24121.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24121/head:pull/24121 PR: https://git.openjdk.org/jdk/pull/24121 From varadam at openjdk.org Wed Mar 19 18:17:48 2025 From: varadam at openjdk.org (Varada M) Date: Wed, 19 Mar 2025 18:17:48 GMT Subject: RFR: 8352393: AIX: Problem list serviceability/attach/AttachAPIv2/StreamingOutputTest.java Message-ID: Excluding the test serviceability/attach/AttachAPIv2/StreamingOutputTest.java JBS Issue : [JDK-8352393](https://bugs.openjdk.org/browse/JDK-8352393) ------------- Commit messages: - AIX: Problem list serviceability/attach/AttachAPIv2/StreamingOutputTest.java Changes: https://git.openjdk.org/jdk/pull/24116/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24116&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352393 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24116.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24116/head:pull/24116 PR: https://git.openjdk.org/jdk/pull/24116 From ascarpino at openjdk.org Wed Mar 19 19:03:17 2025 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Wed, 19 Mar 2025 19:03:17 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 22:25:14 GMT, Volodymyr Paprotski wrote: >> -XX:+/-UseIntPolyIntrinsics (test Java vs BigInt and intrinsic vs BigInt) >> >> Though I think I did this before I knew much about junit.. I think I can just have two @run commands (to make it clearer)? Will give that a try > > Turns out I do need both `@test`; (otherwise `make test TEST=...MontgomeryPolynomialFuzzTest.java` runs fewer tests). Seems other tests do the same. > > I did add a (better?) comment to the summary tag. Oh.. I didn't notice the -/+. Thanks for adding the comment that helps explain it better ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r2004049802 From ascarpino at openjdk.org Wed Mar 19 19:03:18 2025 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Wed, 19 Mar 2025 19:03:18 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 23:05:10 GMT, Volodymyr Paprotski wrote: >> test/jdk/com/sun/security/util/math/intpoly/MontgomeryPolynomialFuzzTest.java line 123: >> >>> 121: } >>> 122: >>> 123: if (rnd.nextBoolean()) { >> >> Why is this done randomly? Wouldn't we want to check these situations every time? > > I was mostly attempting to test 'random paths' through the code, and this was a way to pseudo-randomly accomplish that. (i.e. a product of a difference, a product of a product.. and so on..) > > Since this is looping, we got 50% chance of getting both, without me having to write/think-through all the many permutations of what input/outputs to each operations can be. > > (Extend the loop count to run for several hours during development.. and it does wonders to testing corner cases. Have been following this 'template' in most my PRs) Randomness isn't idea for reproducibility. If a failure occurs, is it obvious what operations were done? I don't see any stdout or stderr messages to know what operations happen to bring about a possible failure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r2004074368 From dholmes at openjdk.org Wed Mar 19 20:34:08 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Mar 2025 20:34:08 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 17:47:00 GMT, Aleksey Shipilev wrote: > Little regression crept in with [JDK-8351142](https://bugs.openjdk.org/browse/JDK-8351142): on the deflation path, object associated with monitor can be already dead. The event wants to record it, touches the dead object and crashes. The fix is simple: since we cannot infer any useful information from the event, we just skip the event emit. > > A new stress test fails within seconds without a fix. It also covers other monitor events, so we have extra coverage there as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, new stress test now passes > - [x] Linux x86_64 server fastdebug, `jdk_jfr` src/hotspot/share/runtime/objectMonitor.cpp line 844: > 842: } > 843: > 844: if (obj != nullptr && event.should_commit()) { So you decided to just drop the event completely ... but is that what people tracking these events would want? Do we need to update anything on the event definition so people know this is the case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24121#discussion_r2004248441 From dholmes at openjdk.org Wed Mar 19 21:57:07 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Mar 2025 21:57:07 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 20:31:24 GMT, David Holmes wrote: >> Little regression crept in with [JDK-8351142](https://bugs.openjdk.org/browse/JDK-8351142): on the deflation path, object associated with monitor can be already dead. The event wants to record it, touches the dead object and crashes. The fix is simple: since we cannot infer any useful information from the event, we just skip the event emit. >> >> A new stress test fails within seconds without a fix. It also covers other monitor events, so we have extra coverage there as well. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, new stress test now passes >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > src/hotspot/share/runtime/objectMonitor.cpp line 844: > >> 842: } >> 843: >> 844: if (obj != nullptr && event.should_commit()) { > > So you decided to just drop the event completely ... but is that what people tracking these events would want? Do we need to update anything on the event definition so people know this is the case? Being a replacement for the `_sync_Deflations` counter it should probably count these null ones too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24121#discussion_r2004369814 From vpaprotski at openjdk.org Wed Mar 19 23:03:08 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 19 Mar 2025 23:03:08 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 19:00:37 GMT, Anthony Scarpino wrote: >> I was mostly attempting to test 'random paths' through the code, and this was a way to pseudo-randomly accomplish that. (i.e. a product of a difference, a product of a product.. and so on..) >> >> Since this is looping, we got 50% chance of getting both, without me having to write/think-through all the many permutations of what input/outputs to each operations can be. >> >> (Extend the loop count to run for several hours during development.. and it does wonders to testing corner cases. Have been following this 'template' in most my PRs) > > Randomness isn't idea for reproducibility. If a failure occurs, is it obvious what operations were done? I don't see any stdout or stderr messages to know what operations happen to bring about a possible failure. I used it this testcase for development (and figured I should also check it in..) so what might be 'obvious' to me, might not be for anyone else? Typically, when a test failed, I grabbed the SEED from the test output, reran the test with that seed fixed and I went to the exception and printed the hex values of the inputs; (then debug from there. Typically, I would write another test, so I could GDB into the intrinsic, with just those input values). It was pretty much always the case always that once I got the inputs, I could reproduce the error i.e. not a type of bug that happens silently then discovered somewhere else. Luckily. All this crypto code is constant-time -no-branches-; so the 'test coverage' here is not 'all-branches-taken' but really 'did you remember to collect all the carries'. like 53-bit limb needs to be propagated back down to 52. Thats what the test here is 'searching' for, some input that could trip up computation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r2004440368 From dholmes at openjdk.org Thu Mar 20 01:16:07 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Mar 2025 01:16:07 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 21:54:34 GMT, David Holmes wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 844: >> >>> 842: } >>> 843: >>> 844: if (obj != nullptr && event.should_commit()) { >> >> So you decided to just drop the event completely ... but is that what people tracking these events would want? Do we need to update anything on the event definition so people know this is the case? > > Being a replacement for the `_sync_Deflations` counter it should probably count these null ones too. With regards to the event definition @egahlin suggests: > It might be worth mentioning it in the field description, e.g. "If null or N/A, the object has been garbage collected", but that is all. (Null is represented as 0 in the file format, which the parser will interpret as a missing value. Both 'jfr print' and JMC will display "N/A") ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24121#discussion_r2004590832 From egahlin at openjdk.org Thu Mar 20 01:24:06 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Thu, 20 Mar 2025 01:24:06 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 01:13:09 GMT, David Holmes wrote: >> Being a replacement for the `_sync_Deflations` counter it should probably count these null ones too. > > With regards to the event definition @egahlin suggests: >> It might be worth mentioning it in the field description, e.g. "If null or N/A, the object has been garbage collected", but that is all. (Null is represented as 0 in the file format, which the parser will interpret as a missing value. Both 'jfr print' and JMC will display "N/A") That is if null is used instead of dropping the event. I don't have an opinion on what is the best approach as it depends on how the event is being used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24121#discussion_r2004597567 From pminborg at openjdk.org Thu Mar 20 01:35:52 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 20 Mar 2025 01:35:52 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v7] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 249 commits: - Fix comments on doc issues - Create separate reentry prevention method and add tests - Merge branch 'master' into implement-jep502 - Merge branch 'master' into implement-jep502 - Clean up exception messages and fix comments - Rename field - Rename method and fix comment - Rework reenterant logic - Use acquire semantics for reading rather than volatile semantics - Add missing null check - ... and 239 more: https://git.openjdk.org/jdk/compare/fcc2a242...4c0dadfb ------------- Changes: https://git.openjdk.org/jdk/pull/23972/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=06 Stats: 4050 lines in 30 files changed: 4019 ins; 18 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From pminborg at openjdk.org Thu Mar 20 01:35:52 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 20 Mar 2025 01:35:52 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:49:37 GMT, Maurizio Cimadamore wrote: >> src/java.base/share/classes/java/lang/StableValue.java line 45: >> >>> 43: >>> 44: /** >>> 45: * A stable value is a shallowly immutable holder of deferred content. >> >> Is this terminology a leftover from previous JEP iterations? The JEP now says: >>> stable values, which are objects that hold immutable data. > > Maybe: `A stable value in an holder for shallowly immutable content`. I've updated the text. Let me know what you think ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2004617953 From pminborg at openjdk.org Thu Mar 20 01:35:53 2025 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 20 Mar 2025 01:35:53 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 15:44:37 GMT, Maurizio Cimadamore wrote: >> Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: >> >> - Merge branch 'master' into implement-jep502 >> - Clean up exception messages and fix comments >> - Rename field >> - Rename method and fix comment >> - Rework reenterant logic >> - Use acquire semantics for reading rather than volatile semantics >> - Add missing null check >> - Simplify handling of sentinel, wrap, and unwrap >> - Fix JavaDoc issues >> - Fix members in StableEnumFunction >> - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f > > src/java.base/share/classes/java/lang/StableValue.java line 339: > >> 337: * which would introduce security vulnerabilities. >> 338: *

>> 339: * As objects can be set via stable values but never removed, this can be a source > > It feels like this could probably be expanded upon -- also covering stable functions (and morphed into a new section) I do not understand the comment. Each factory has a note on `Serializable` and now there is no general comment about security issues as per comments made earlier. Can you elaborate, please? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2004612892 From iklam at openjdk.org Thu Mar 20 04:54:01 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 20 Mar 2025 04:54:01 GMT Subject: RFR: 8352437: -XX:+AOTClassLinking is not compatible with --add-export Message-ID: `-XX:+AOTClassLinking` requires the CDS archived full module graph (FMG). - Before this PR, when `--add-export` is specified, FMG is disabled, so AOT caches created with `-XX:+AOTClassLinking` cannot be loaded. - After this PR, if the exact same `--add-export` flags as specified across the training/assembly/production phases, the FMG can be used, so we can use so AOT caches created with `-XX:+AOTClassLinking`. The change itself is straight-forward: just remember the `--add-export` flags specified during AOT cache creation, and check the exact same ones are used during the production run. I did a fair amount of refactoring to change the "exact options specified" checks in modules.cpp, so more such options can be easily added in the future (we need to handle `--add-reads` and `--add-opens` in future RFEs). (Note: this PR depends on #24122 ) ------------- Depends on: https://git.openjdk.org/jdk/pull/24122 Commit messages: - Fixed whitespaces - clean up - 8352437: -XX:+AOTClassLinking is not compatible with --add-export Changes: https://git.openjdk.org/jdk/pull/24124/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24124&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352437 Stats: 575 lines in 16 files changed: 457 ins; 65 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/24124.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24124/head:pull/24124 PR: https://git.openjdk.org/jdk/pull/24124 From fyang at openjdk.org Thu Mar 20 06:43:09 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 20 Mar 2025 06:43:09 GMT Subject: RFR: 8352423: RISC-V: simplify DivI/L ModI/L [v2] In-Reply-To: References: <3oq1Fgqt-St5FpdVKDixDPfiI4dHLdMosA9dyhupwpA=.c782c45f-dab1-4bb6-a3d9-8d7ae6c56a8c@github.com> Message-ID: On Wed, 19 Mar 2025 16:13:32 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Currently, implementation of DivI/L and ModI/L are overcomplicated, could and should be simplified. >> And, also enable some DivI/L and ModI/L related tests. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > copyright Thanks for the cleanup! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24119#pullrequestreview-2701443443 From shade at openjdk.org Thu Mar 20 07:06:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Mar 2025 07:06:25 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: References: Message-ID: > Little regression crept in with [JDK-8351142](https://bugs.openjdk.org/browse/JDK-8351142): on the deflation path, object associated with monitor can be already dead. > > A new stress test fails within seconds without a fix. It also covers other monitor events, so we have extra coverage there as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, new stress test now passes > - [x] Linux x86_64 server fastdebug, `jdk_jfr` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Emit the event anyway ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24121/files - new: https://git.openjdk.org/jdk/pull/24121/files/2b1d1b14..862a0176 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24121&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24121&range=00-01 Stats: 22 lines in 3 files changed: 12 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24121.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24121/head:pull/24121 PR: https://git.openjdk.org/jdk/pull/24121 From shade at openjdk.org Thu Mar 20 07:06:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Mar 2025 07:06:25 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 01:21:06 GMT, Erik Gahlin wrote: >> With regards to the event definition @egahlin suggests: >>> It might be worth mentioning it in the field description, e.g. "If null or N/A, the object has been garbage collected", but that is all. (Null is represented as 0 in the file format, which the parser will interpret as a missing value. Both 'jfr print' and JMC will display "N/A") > > That is if null is used instead of dropping the event. I don't have an opinion on what is the best approach as it depends on how the event is being used. I don't have a strong preference for either dropping event or emitting it with N/A values. I get the argument for replacing `_sync_Deflations` to count the actual deflations, so lets emit the event with N/A values then. Amended in new commit. Unfortunately, I see no reliable way to test exactly the case for emitting the N/A event: we need to sequence monitor deflation and GC precisely to get to that point reliably. So the targeted event test just tests the always-reachable case, while the new stress test covers the corner case of dead object, among other things. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24121#discussion_r2004941954 From rehn at openjdk.org Thu Mar 20 07:22:10 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 20 Mar 2025 07:22:10 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v2] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 10:53:03 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - hwprobe deps >> - Merge branch 'master' into maxvector_0 >> - Moved to common >> - Disable UseZvfh when no RVV > > Looks good. Thanks! @Hamlin-Li are you still good with this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24094#issuecomment-2739433003 From mli at openjdk.org Thu Mar 20 08:39:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 20 Mar 2025 08:39:13 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 06:50:25 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> Added case to turn off UseZvfh when no RVV. >> Which is the cause of the test issues, zvfh on but no rvv. >> >> Also made all case identical and added no warning when default. >> Move them to the common init, as the "UseExtension" is not C2 specific. >> >> Manual tested and some random compiler tests. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - hwprobe deps > - Merge branch 'master' into maxvector_0 > - Moved to common > - Disable UseZvfh when no RVV Interesting suggestion and change. Thanks for asking, I was thinking the modification based on Fei's suggestion is a simple one. Have several questions in my mind: 1. in the while loop of `VM_Version::setup_cpu_available_features`, is there an order guarantee in cpu features? consider a situation, A depends on B, but B is after A in the while loop. 2. what if there are more than 2 cpu features in the dependency chain? 3. For consistency, should we disable the cpu feature in RVFeatureValue? e.g. in vm_version_riscv.cpp when `(UseZvbb && !UseRVV)` we should also call `RVFeatureValue::disable_feature`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24094#issuecomment-2739589571 From tschatzl at openjdk.org Thu Mar 20 09:44:07 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 20 Mar 2025 09:44:07 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v26] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * make young gen length revising independent of refinement thread * use a service task * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/f419556e..5e76a516 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=24-25 Stats: 337 lines in 12 files changed: 237 ins; 90 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Thu Mar 20 09:49:13 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 20 Mar 2025 09:49:13 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v26] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 09:44:07 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * make young gen length revising independent of refinement thread > * use a service task > * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update Commit https://github.com/openjdk/jdk/pull/23739/commits/5e76a516c848e75f56e966a1ffe4115b1dce786c implements the change to make young gen length revising independent of the refinement control thread. Infrastructure to determine currently available number of bytes for allocation and determining the next time the particular task should be redone is shared. It may be distributed across a bit more methods than I would prefer, but particularly the refinement control thread wants to reuse and keep some intermediate results (to not be required to get the `Heap_lock` again basically). I did not have a good reason to make the heuristic to determine the time to the next action different for both, so they are basically the same. There is some pre-existing problem that the minimum time for re-doing the work is ~50ms. That might be too short in some cases, but then again, if you have that short of a GC interval it may not be very useful to e.g. revise young gen length anyway. I think with this change all current concerns are addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2739766880 From duke at openjdk.org Thu Mar 20 11:29:57 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 20 Mar 2025 11:29:57 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v8] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: responding to review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23860/files - new: https://git.openjdk.org/jdk/pull/23860/files/aa2fdf2d..2438fb5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=06-07 Stats: 750 lines in 3 files changed: 174 ins; 447 del; 129 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From mli at openjdk.org Thu Mar 20 12:31:16 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 20 Mar 2025 12:31:16 GMT Subject: RFR: 8352423: RISC-V: simplify DivI/L ModI/L [v2] In-Reply-To: References: <3oq1Fgqt-St5FpdVKDixDPfiI4dHLdMosA9dyhupwpA=.c782c45f-dab1-4bb6-a3d9-8d7ae6c56a8c@github.com> Message-ID: On Thu, 20 Mar 2025 06:40:09 GMT, Fei Yang wrote: > Thanks for the cleanup! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24119#issuecomment-2740296901 From rehn at openjdk.org Thu Mar 20 13:01:07 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 20 Mar 2025 13:01:07 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 08:36:49 GMT, Hamlin Li wrote: > Interesting suggestion and change. Thanks for asking, I was thinking the modification based on Fei's suggestion is a simple one. > > Have several questions in my mind: > > 1. in the while loop of `VM_Version::setup_cpu_available_features`, is there an order guarantee in cpu features? consider a situation, A depends on B, but B is after A in the while loop. They are in list order. I did add a comment explaining that. > 2. what if there are more than 2 cpu features in the dependency chain? I assume independent dependencies, so you would have to fix that. > 3. For consistency, should we disable the cpu feature in RVFeatureValue? e.g. in vm_version_riscv.cpp when `(UseZvbb && !UseRVV)` we should also call `RVFeatureValue::disable_feature`. If the feature flag is identical to the use flag we may as well remove the feature flag. The point was that they can be different, I think this piece of code here is wrong: /* Sync CPU features with flags */ \ if (!flag) { disable_feature(); } As it is done before the flag is final, which means the flag may later be true. We build the feature string before flags are final. We build the feature string from RVFeatureValue enabled/disabled instead of UseFlags value. So we should probably build the feature string last in VM_Version::initialize() when all flags are stable. And change the function: `bool feature_string() { return _feature_string; }` to "return UseXXX" if it have a flag mapping. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24094#issuecomment-2740373896 From mli at openjdk.org Thu Mar 20 13:47:10 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 20 Mar 2025 13:47:10 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 06:50:25 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> Added case to turn off UseZvfh when no RVV. >> Which is the cause of the test issues, zvfh on but no rvv. >> >> Also made all case identical and added no warning when default. >> Move them to the common init, as the "UseExtension" is not C2 specific. >> >> Manual tested and some random compiler tests. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - hwprobe deps > - Merge branch 'master' into maxvector_0 > - Moved to common > - Disable UseZvfh when no RVV I'm fine with either way, maybe we should put the change of Fei's suggestion into another PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24094#issuecomment-2740507416 From rehn at openjdk.org Thu Mar 20 14:09:11 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 20 Mar 2025 14:09:11 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v2] In-Reply-To: References: Message-ID: <30dIczT-7jvuN1s_35ajYxp7W384sqYc_KsYwbcTXEE=.2e6323af-7cc3-4521-8e98-dc5110e6c761@github.com> On Thu, 20 Mar 2025 13:44:36 GMT, Hamlin Li wrote: > I'm fine with either way, maybe we should put the change of Fei's suggestion into another PR? As this additional change do not change the behavior of UseFlags. It only changes that ext_Zv?? will not be enable if there is no ext_V. Another options is to set this directly in hwprobe query: `if (VM_Version::ext_V.enable_feature() && is_set(RISCV_HWPROBE_KEY_IMA_EXT_0, RISCV_HWPROBE_EXT_ZVFH)) {` So the issue you describe 3 is already there. Which is essentialy the cpu string will contain cpu features that are disabled. (as none reads the RVFeature's after boot) And as I said the issue is building feature string from RVFeatureValue instead of the UseFlags. I think this should go in as is, and a follow up to address the cpu string. So the cpu string, as the other platforms, reflect 'active ISA'. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24094#issuecomment-2740578758 From mli at openjdk.org Thu Mar 20 14:56:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 20 Mar 2025 14:56:13 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 06:50:25 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> Added case to turn off UseZvfh when no RVV. >> Which is the cause of the test issues, zvfh on but no rvv. >> >> Also made all case identical and added no warning when default. >> Move them to the common init, as the "UseExtension" is not C2 specific. >> >> Manual tested and some random compiler tests. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - hwprobe deps > - Merge branch 'master' into maxvector_0 > - Moved to common > - Disable UseZvfh when no RVV src/hotspot/cpu/riscv/vm_version_riscv.hpp line 92: > 90: void update_flag() { \ > 91: assert(enabled(), "Must be."); \ > 92: if (FLAG_IS_DEFAULT(flag)) { \ As this patch introduces a way to automatically disable a flag based on its dependant flag, this could introduce potential issue if the expected order is not guarantee when writing the code. It's better to have an assert there, this could be acheive by just using the existing `_value`, e.g. `assert(dep.value() != -1);` Or maybe it's better to introduce a new field like `initialized`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24094#discussion_r2005832663 From rehn at openjdk.org Thu Mar 20 15:39:10 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 20 Mar 2025 15:39:10 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 14:53:37 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - hwprobe deps >> - Merge branch 'master' into maxvector_0 >> - Moved to common >> - Disable UseZvfh when no RVV > > src/hotspot/cpu/riscv/vm_version_riscv.hpp line 92: > >> 90: void update_flag() { \ >> 91: assert(enabled(), "Must be."); \ >> 92: if (FLAG_IS_DEFAULT(flag)) { \ > > As this patch introduces a way to automatically disable a flag based on its dependant flag, this could introduce potential issue if the expected order is not guarantee when writing the code. It's better to have an assert there, this could be acheive by just using the existing `_value`, e.g. > `assert(dep.value() != -1);` > Or maybe it's better to introduce a new field like `initialized`? You can check that in compile time with: `STATIC_ASSERT(offsetof(VM_Version, ext_ZvXX) > offsetof(VM_Version, ext_V));` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24094#discussion_r2005922100 From mdoerr at openjdk.org Thu Mar 20 15:47:22 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 20 Mar 2025 15:47:22 GMT Subject: RFR: 8334247: [PPC64] Consider trap based nmethod entry barriers Message-ID: We can shrink nmethod entry barriers to 4 instructions (from 8) using conditional trap instructions. Performance needs to be evaluated, yet. ------------- Commit messages: - 8334247: [PPC64] Consider trap based nmethod entry barriers Changes: https://git.openjdk.org/jdk/pull/24135/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24135&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334247 Stats: 68 lines in 8 files changed: 48 ins; 2 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/24135.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24135/head:pull/24135 PR: https://git.openjdk.org/jdk/pull/24135 From mbaesken at openjdk.org Thu Mar 20 16:02:20 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 20 Mar 2025 16:02:20 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp Message-ID: There are a few divisions by zero in sharedRuntimeTrans.cpp, used to "construct" NaN and -infinity. This should probably be replaced by using functionality from std::numeric_limits . ------------- Commit messages: - JDK-8346931 Changes: https://git.openjdk.org/jdk/pull/24136/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24136&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346931 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24136/head:pull/24136 PR: https://git.openjdk.org/jdk/pull/24136 From mli at openjdk.org Thu Mar 20 16:06:08 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 20 Mar 2025 16:06:08 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v2] In-Reply-To: References: Message-ID: <5o_6LiKmaWr9PMLlcK8qmrgH5YNMO1B2tKBU25g4AFY=.e088bf1e-4c08-4264-a03d-133753452184@github.com> On Thu, 20 Mar 2025 15:36:21 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/vm_version_riscv.hpp line 92: >> >>> 90: void update_flag() { \ >>> 91: assert(enabled(), "Must be."); \ >>> 92: if (FLAG_IS_DEFAULT(flag)) { \ >> >> As this patch introduces a way to automatically disable a flag based on its dependant flag, this could introduce potential issue if the expected order is not guarantee when writing the code. It's better to have an assert there, this could be acheive by just using the existing `_value`, e.g. >> `assert(dep.value() != -1);` >> Or maybe it's better to introduce a new field like `initialized`? > > You can check that in compile time with: > `STATIC_ASSERT(offsetof(VM_Version, ext_ZvXX) > offsetof(VM_Version, ext_V));` That's also fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24094#discussion_r2005980982 From ascarpino at openjdk.org Thu Mar 20 17:37:27 2025 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Thu, 20 Mar 2025 17:37:27 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4] In-Reply-To: References: Message-ID: <7RbzyVMGDjIExr2AfjOVElXXrKIlddltIo6vPH0yxQs=.7296744e-29f1-4e72-a44d-ce8875be6644@github.com> On Wed, 19 Mar 2025 23:00:55 GMT, Volodymyr Paprotski wrote: >> Randomness isn't idea for reproducibility. If a failure occurs, is it obvious what operations were done? I don't see any stdout or stderr messages to know what operations happen to bring about a possible failure. > > I used it this testcase for development (and figured I should also check it in..) so what might be 'obvious' to me, might not be for anyone else? > > Typically, when a test failed, I grabbed the SEED from the test output, reran the test with that seed fixed and I went to the exception and printed the hex values of the inputs; (then debug from there. Typically, I would write another test, so I could GDB into the intrinsic, with just those input values). > > It was pretty much always the case always that once I got the inputs, I could reproduce the error i.e. not a type of bug that happens silently then discovered somewhere else. Luckily. All this crypto code is constant-time -no-branches-; so the 'test coverage' here is not 'all-branches-taken' but really 'did you remember to collect all the carries'. like 53-bit limb needs to be propagated back down to 52. Thats what the test here is 'searching' for, some input that could trip up computation. Can you add a comment to the test code about how you use the seed to reproduce any failures? So that in the future, someone who doesn't know will now have an idea how to start debugging this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r2006142794 From duke at openjdk.org Thu Mar 20 18:42:48 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 20 Mar 2025 18:42:48 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v9] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: More beautification ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23860/files - new: https://git.openjdk.org/jdk/pull/23860/files/2438fb5c..1cfab778 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=07-08 Stats: 307 lines in 1 file changed: 49 ins; 131 del; 127 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From kbarrett at openjdk.org Thu Mar 20 19:57:08 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 20 Mar 2025 19:57:08 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 15:56:10 GMT, Matthias Baesken wrote: > There are a few divisions by zero in sharedRuntimeTrans.cpp, used to "construct" NaN and -infinity. This should probably be replaced by using functionality from std::numeric_limits . There are more potential divisions by zero in the vicinity of code conditionalized on `CAN_USE_NAN_DEFINE` (mentioned in a comment in the JBS issue), not all of them under the conditional. For example, here's an unprotected one, where `z` may be zero: https://github.com/openjdk/jdk/blame/56038fb5a156568cce2e80f5db18b10ad61c06e4/src/hotspot/share/runtime/sharedRuntimeTrans.cpp#L519 Probably zero handling should be it's own clause. And there's the #else code for the conditional code, using `(z-z)/(z-z)` to construct a NaN: https://github.com/openjdk/jdk/blame/56038fb5a156568cce2e80f5db18b10ad61c06e4/src/hotspot/share/runtime/sharedRuntimeTrans.cpp#L525 Also the other use of that macro: https://github.com/openjdk/jdk/blame/56038fb5a156568cce2e80f5db18b10ad61c06e4/src/hotspot/share/runtime/sharedRuntimeTrans.cpp#L541 (This one is also missing `{ ... }` around `then` clause.) ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24136#pullrequestreview-2703946566 From kbarrett at openjdk.org Thu Mar 20 20:03:15 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 20 Mar 2025 20:03:15 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 19:54:45 GMT, Kim Barrett wrote: > There are more potential divisions by zero in the vicinity of code conditionalized on `CAN_USE_NAN_DEFINE` (mentioned in a comment in the JBS issue), not all of them under the conditional. Maybe we're missing some tests, since these apparently don't show up with ubsan testing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24136#issuecomment-2741531456 From duke at openjdk.org Thu Mar 20 20:37:25 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 20 Mar 2025 20:37:25 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v10] In-Reply-To: References: Message-ID: <2N5Evij0f6qZi_pG3tqoz11aQbSnLG0YszqHR9ROfKI=.d44b16c6-d334-42c4-8de8-92eb41229248@github.com> > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Fix windows build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23860/files - new: https://git.openjdk.org/jdk/pull/23860/files/1cfab778..e9db09e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From duke at openjdk.org Thu Mar 20 21:09:12 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 20 Mar 2025 21:09:12 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v7] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 19:24:52 GMT, Volodymyr Paprotski wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Made the intrinsics test separate from the pure java test. > > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 58: > >> 56: >> 57: ATTRIBUTE_ALIGNED(64) static const uint32_t dilithiumAvx512Perms[] = { >> 58: // collect montmul results into the destination register > > same as `dilithiumAvx512Consts()`, 'magic offsets'; except here they are harder to count (eg. not clear visually what is the offset of `ntt inverse`). > > Could be split into three constant arrays to make the compiler count for us Well, it is 64 bytes per line (16 4-byte uint32_ts), not that hard :-) ... > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 140: > >> 138: __ vpmuldq(xmm(scratchReg1 + 1), xmm(inputReg12), xmm(inputReg2 + 1), Assembler::AVX_512bit); >> 139: __ vpmuldq(xmm(scratchReg1 + 2), xmm(inputReg13), xmm(inputReg2 + 2), Assembler::AVX_512bit); >> 140: __ vpmuldq(xmm(scratchReg1 + 3), xmm(inputReg14), xmm(inputReg2 + 3), Assembler::AVX_512bit); > > Another option for these four lines, to keep the style of rest of function > > int inputReg1[] = {inputReg11, inputReg12, inputReg13, inputReg14}; > for (int i = 0; i < parCnt; i++) { > __ vpmuldq(xmm(scratchReg1 + i), inputReg1[i], xmm(inputReg2 + i), Assembler::AVX_512bit); > } I have changed the whole structure instead. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 197: > >> 195: >> 196: // level 0 >> 197: montmulEven(20, 8, 29, 20, 16, 4); > > It would improve readability to know which parameter is a register, and which is a count.. i.e. > > `montmulEven(xmm20, xmm8, xmm29, xmm20, xmm16, 4);` > > (its not _that_ bad, once I remember that its always the last parameter.. but it does add to the 'mental load' one has to carry, and this code is already interesting enough) I have changed the structure, now it is clear(er) which parameter is what. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 980: > >> 978: // Dilithium multiply polynomials in the NTT domain. >> 979: // Implements >> 980: // static int implDilithiumNttMult( > > I suppose no java changes in this PR, but I notice that the inputs are all assumed to have fixed size. > > Most/all intrinsics I worked with had some sort of guard (eg `Objects.checkFromIndexSize`) right before the intrinsic java call. (It usually looks like it can be optimized away). But I notice no such guard here on the java side. These functions will not be used anywhere else and in ML_DSA.java all of the arrays passed to inrinsics are of the correct size. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1010: > >> 1008: __ vpbroadcastd(xmm31, Address(dilithiumConsts, 4), Assembler::AVX_512bit); // q >> 1009: __ vpbroadcastd(xmm29, Address(dilithiumConsts, 12), Assembler::AVX_512bit); // 2^64 mod q >> 1010: __ evmovdqul(xmm28, Address(perms, 0), Assembler::AVX_512bit); > > - use of `c_rarg3` is 'clever' so probably should have a comment (ie. 'no 3rd parameter, free register') > - Alternatively, load directly into the vector with `ExternalAddress()`; you need a scratch register (use r10) but address is close enough, it actually wont be used. Here is the disassembly I got: > > StubRoutines::dilithiumNttMult [0x00007f414fb68280, 0x00007f414fb68548] (712 bytes) > -------------------------------------------------------------------------------- > add %al,(%rax) > 0x00007f414fb68280: push %rbp > 0x00007f414fb68281: mov %rsp,%rbp > 0x00007f414fb68284: vpbroadcastd 0x18f9fe32(%rip),%zmm30 # 0x00007f4168b080c0 > 0x00007f414fb6828e: vpbroadcastd 0x18f9fe2c(%rip),%zmm31 # 0x00007f4168b080c4 > 0x00007f414fb68298: vpbroadcastd 0x18f9fe2a(%rip),%zmm29 # 0x00007f4168b080cc > 0x00007f414fb682a2: vmovdqu32 0x18f9f8d4(%rip),%zmm28 # 0x00007f4168b07b80 > ``` > > The `ExternalAddress()` calls for above assembler > ``` > const Register scratch = r10; > const XMMRegister montRSquareModQ = xmm29; > const XMMRegister montQInvModR = xmm30; > const XMMRegister dilithium_q = xmm31; > const XMMRegister perms = xmm28; > > __ vpbroadcastd(montQInvModR, ExternalAddress(dilithiumAvx512ConstsAddr()), Assembler::AVX_512bit, scratch); // q^-1 mod 2^32 > __ vpbroadcastd(dilithium_q, ExternalAddress(dilithiumAvx512ConstsAddr() + 4), Assembler::AVX_512bit, scratch); // q > __ vpbroadcastd(montRSquareModQ, ExternalAddress(dilithiumAvx512ConstsAddr() + 12), Assembler::AVX_512bit, scratch); // 2^64 mod q > __ evmovdqul(perms, k0, ExternalAddress(dilithiumAvx512PermsAddr()), false, Assembler::AVX_512bit, scratch); > > (and `dilithiumAvx512ConstsAddr(offset)` cound take an int parameter too) I added comments and changed the vpbroadcast loads to load directly from memory.l > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1012: > >> 1010: __ evmovdqul(xmm28, Address(perms, 0), Assembler::AVX_512bit); >> 1011: >> 1012: __ movl(len, 4); > > Compile-time constant, why not 'unroll at compile time'? i.e. wrap this loop with `for (int len=0; len<4; len++)` instead? I have found that unrolling these loops actually hurts performance (probably an I-cache effect. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1041: > >> 1039: for (int i = 0; i < 4; i++) { >> 1040: __ evmovdqul(Address(result, i * 64), xmm(i), Assembler::AVX_512bit); >> 1041: } > > This is nice, compact and clean. The biggest issue I have with following this code is really with all the 'raw' registers. I would much rather prefer symbolic names, but up to you to decide style. > > I ended up 'annotating' this snippet, so I could understand it and confirm everything.. as with montmulEven, hope some of it can be useful to you to copy out. > > > XMMRegister POLY1[] = {xmm0, xmm1, xmm2, xmm3}; > XMMRegister POLY2[] = {xmm4, xmm5, xmm6, xmm7}; > XMMRegister SCRATCH1[] = {xmm12, xmm13, xmm14, xmm15}; > XMMRegister SCRATCH2[] = {xmm16, xmm17, xmm18, xmm19}; > XMMRegister SCRATCH3[] = {xmm8, xmm9, xmm10, xmm11}; > for (int i = 0; i < 4; i++) { > __ evmovdqul(POLY1[i], Address(poly1, i * 64), Assembler::AVX_512bit); > __ evmovdqul(POLY2[i], Address(poly2, i * 64), Assembler::AVX_512bit); > } > > // montmulEven: inputs are in even columns and output is in odd columns > // scratch3_even = poly2_even*montRSquareModQ // poly2 to montgomery domain > montmulEven2(SCRATCH3[0], POLY2[0], montRSquareModQ, SCRATCH1[0], SCRATCH2[0], montQInvModR, dilithium_q, 4, _masm); > for (int i = 0; i < 4; i++) { > // swap even/odd; 0xB1 == 2-3-0-1 > __ vpshufd(SCRATCH3[i], SCRATCH3[i], 0xB1, Assembler::AVX_512bit); > } > > // scratch3_odd = poly1_even*scratch3_even = poly1_even*poly2_even*montRSquareModQ > montmulEven2(SCRATCH3[0], POLY1[0], SCRATCH3[0], SCRATCH1[0], SCRATCH2[0], 4, montQInvModR, dilithium_q, 4, _masm); > for (int i = 0; i < 4; i++) { > __ vpshufd(POLY1[i], POLY1[i], 0xB1, Assembler::AVX_512bit); > __ vpshufd(POLY2[i], POLY2[i], 0xB1, Assembler::AVX_512bit); > } > > // poly2_even = poly2_odd*montRSquareModQ // poly2 to montgomery domain > montmulEven2(POLY2[0], POLY2[0], montRSquareModQ, SCRATCH1[0], SCRATCH2[0], 4, montQInvModR, dilithium_q, 4, _masm); > for (int i = 0; i < 4; i++) { > __ vpshufd(POLY2[i], POLY2[i], 0xB1, Assembler::AVX_512bit); > } > > // poly1_odd = poly1_even*poly2_even > montmulEven2(POLY1[0], POLY1[0], POLY2[0], SCRATCH1[0], SCRATCH2[0], 4, montQInvModR, dilithium_q, 4, _masm); > for (int i = 0; i < 4; i++) { > // result is scrambled between scratch3_odd and poly1_odd; unscramble > __ evpermt2d(POLY1[i], perms, SCRATCH3[i], Assembler::AVX_512bit); > } > for (int i = 0; i < 4; i++) { > __ evmovdqul(Address(result, i *... I have rewritten it to use full montmuls (a new function) her and everywhere else. It is much easier to follow the code that way. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1090: > >> 1088: __ evpbroadcastd(xmm29, constant, Assembler::AVX_512bit); // constant multiplier >> 1089: >> 1090: __ movl(len, 2); > > Same comment here as the `generate_dilithiumNttMult_avx512` > - constants can be loaded directly into XMM > - len can be removed by unrolling at compile time > - symbolic names could be used for registers > - comments could be added Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2006455445 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2006455814 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2006455732 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2006454991 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2006455529 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2006455662 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2006455178 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2006455086 From duke at openjdk.org Thu Mar 20 21:09:14 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 20 Mar 2025 21:09:14 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v5] In-Reply-To: References: <3bphXKLpIpxAZP-FEOeob6AaHbv0BAoEceJka64vMW8=.3e4f74e0-9479-4926-b365-b08d8d702692@github.com> Message-ID: On Thu, 6 Mar 2025 19:27:12 GMT, Volodymyr Paprotski wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Accepted review comments. > > src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 426: > >> 424: __ subl( roundsLeft, 1); >> 425: >> 426: __ evmovdquw(xmm5, xmm0, Assembler::AVX_512bit); > > Is there a pattern here; that can be 'compacted' into a loop? Unfortunately, no. This loop body is imported from generate_sha3_implCompress() and doubled, as explained in the comment about 15 lines above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2006455877 From never at openjdk.org Fri Mar 21 05:59:09 2025 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 21 Mar 2025 05:59:09 GMT Subject: RFR: 8350892: [JVMCI] Align ResolvedJavaType.getInstanceFields with Class.getDeclaredFields In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 23:46:54 GMT, Doug Simon wrote: > The current order of fields returned by `ResolvedJavaType.getInstanceFields` is a) not well specified and b) different than the order of fields used almost everywhere else in HotSpot. This PR aligns the order of `getInstanceFields` with `Class.getDeclaredFields()`. > > It also makes `ciInstanceKlass::_nonstatic_fields` use the same order which unifies how escape analysis and deoptimization treats fields across C2 and JVMCI. Seems like a good change. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23849#pullrequestreview-2704807131 From dholmes at openjdk.org Fri Mar 21 06:26:08 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 21 Mar 2025 06:26:08 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 07:06:25 GMT, Aleksey Shipilev wrote: >> Little regression crept in with [JDK-8351142](https://bugs.openjdk.org/browse/JDK-8351142): on the deflation path, object associated with monitor can be already dead. >> >> A new stress test fails within seconds without a fix. It also covers other monitor events, so we have extra coverage there as well. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, new stress test now passes >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Emit the event anyway src/hotspot/share/runtime/objectMonitor.cpp line 744: > 742: // Emit the event anyway, but without details. > 743: event->set_monitorClass(nullptr); > 744: event->set_address(0); Shouldn't this be the default state of the event? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24121#discussion_r2006925295 From iklam at openjdk.org Fri Mar 21 07:00:22 2025 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 21 Mar 2025 07:00:22 GMT Subject: RFR: 8352579: Refactor CDS legacy optimization for lambda proxy classes Message-ID: Since JDK 16, CDS has provided limited optimization for lambda expressions. This has been superseded by JEP 483 and is useful only when `-XX:+AOTClassLinking` is not enabled (which is the case for the default CDS archive, for compatibility reasons). The "legacy lambda optimization" may eventually be removed. For the time being, we should consolidate the code into a single source code and clearly mark its uses. This way we can avoid confusion with the JEP 483 code for supporting lambdas (and other java.lang.invoke functionalities). ------------- Commit messages: - 8352579: Refactor CDS legacy optimization for lambda proxy classes Changes: https://git.openjdk.org/jdk/pull/24145/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24145&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352579 Stats: 1018 lines in 17 files changed: 545 ins; 425 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/24145.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24145/head:pull/24145 PR: https://git.openjdk.org/jdk/pull/24145 From shade at openjdk.org Fri Mar 21 08:13:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 21 Mar 2025 08:13:07 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: References: Message-ID: <9RYKO6P037JZUM-Fp7LUnMWHPLdXt5IO3ujvTDliJyw=.674c2fec-ca60-4def-81bb-01d5515e3fe2@github.com> On Fri, 21 Mar 2025 06:21:14 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Emit the event anyway > > src/hotspot/share/runtime/objectMonitor.cpp line 744: > >> 742: // Emit the event anyway, but without details. >> 743: event->set_monitorClass(nullptr); >> 744: event->set_address(0); > > Shouldn't this be the default state of the event? I looked at it before doing the patch, and I don't think the event-specific fields are default-initialized. In fact, if you skip these, JFR code would assert: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/shade/trunks/jdk/build/linux-x86_64-server-fastdebug/hotspot/variant-server/gensrc/jfrfiles/jfrEventClasses.hpp:1181), pid=279828, tid=279849 # assert(verify_field_bit(0)) failed: Attempting to write an uninitialized event field: _monitorClass ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24121#discussion_r2007055113 From mbaesken at openjdk.org Fri Mar 21 08:22:06 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 21 Mar 2025 08:22:06 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 20:00:19 GMT, Kim Barrett wrote: > There are more potential divisions by zero in the vicinity of code conditionalized on CAN_USE_NAN_DEFINE (mentioned in a comment in the JBS Do you think we can get rid of the `CAN_USE_NAN_DEFINE ` (only usage is in sharedRuntimeTrans.cpp) and use `std::numeric_limits::quiet_NaN() ` instead ? Or is there still some compatibility concern with that ? > Maybe we're missing some tests, since these apparently don't show up with ubsan testing? Not sure about this; we used so far ubsan mostly (like 90%) on Linux x86_64 and afaik there we have intrinsics so we do not run into the coding. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24136#issuecomment-2742649962 From stuefe at openjdk.org Fri Mar 21 10:06:17 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 21 Mar 2025 10:06:17 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v3] In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: On Fri, 14 Mar 2025 09:20:40 GMT, Thomas Stuefe wrote: >> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. >> >> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). >> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). >> >> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . >> >> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. >> >> Tests: >> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths >> - SAP reports all tests green (they had reported errors with the previous version) >> - Oracle Tests ongoing >> - GHAs green > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use > - skip test if we have no COH archive > - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use > - aix fix > - test and aix exclusion > - Fix windows when ArchiveRelocationMode=0 or 2 > - original friendly ping ------------- PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2742891097 From mdoerr at openjdk.org Fri Mar 21 10:18:18 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 21 Mar 2025 10:18:18 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: On Wed, 5 Mar 2025 11:43:15 GMT, Joachim Kern wrote: >> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. >> >> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). >> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). >> >> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . >> >> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. >> >> Tests: >> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths >> - SAP reports all tests green (they had reported errors with the previous version) >> - Oracle Tests ongoing >> - GHAs green > > Hi Thomas, > mprotect supports System V shared memory, but only if running in an environment where the MPROTECT_SHM=ON environmental variable is defined, which is not the case in the jdk. So we can fairly say System V shared memory cannot be mprotected by us. > > The documentation says: > _The mprotect subroutine can only be used on shared memory regions backed with 4 KB or 64 KB pages;_ > So we can mprotect 64K pages and mmap supports 64K pages beginning with AIX 7.3 TL1. > With JDK-8334371 we favor the use of mmap 64K pages over System V shared memory if running on a system with AIX 7.3 TL1 or higher. But as long as we allow lower os versions the system V shared memory is still in place, and the mprotect restriction stays valid. I haven't seen test errors with this new version. @JoKern65, @MBaesken: Are you aware of any problems? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2742920100 From rehn at openjdk.org Fri Mar 21 11:16:34 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 21 Mar 2025 11:16:34 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v3] In-Reply-To: References: Message-ID: > Hi please consider. > > Added case to turn off UseZvfh when no RVV. > Which is the cause of the test issues, zvfh on but no rvv. > > Also made all case identical and added no warning when default. > Move them to the common init, as the "UseExtension" is not C2 specific. > > Manual tested and some random compiler tests. > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into maxvector_0 - Merge branch 'master' into maxvector_0 - hwprobe deps - Merge branch 'master' into maxvector_0 - Moved to common - Disable UseZvfh when no RVV ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24094/files - new: https://git.openjdk.org/jdk/pull/24094/files/2357c157..92908e1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24094&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24094&range=01-02 Stats: 6524 lines in 176 files changed: 2388 ins; 2313 del; 1823 mod Patch: https://git.openjdk.org/jdk/pull/24094.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24094/head:pull/24094 PR: https://git.openjdk.org/jdk/pull/24094 From luhenry at openjdk.org Fri Mar 21 11:23:21 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Mar 2025 11:23:21 GMT Subject: RFR: 8352423: RISC-V: simplify DivI/L ModI/L [v2] In-Reply-To: References: <3oq1Fgqt-St5FpdVKDixDPfiI4dHLdMosA9dyhupwpA=.c782c45f-dab1-4bb6-a3d9-8d7ae6c56a8c@github.com> Message-ID: <1qPH9TBDeiNpE_k-65RglinfjS44xO_Jjv_M6OjSSAI=.566a1910-37c4-4ac1-9745-31378e0333a9@github.com> On Wed, 19 Mar 2025 16:13:32 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Currently, implementation of DivI/L and ModI/L are overcomplicated, could and should be simplified. >> And, also enable some DivI/L and ModI/L related tests. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > copyright Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24119#pullrequestreview-2705580752 From luhenry at openjdk.org Fri Mar 21 11:25:16 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Mar 2025 11:25:16 GMT Subject: RFR: 8352159: RISC-V: add more zfa support [v4] In-Reply-To: References: Message-ID: <_0U03lXbC9t9SBOMmF-XjD9_O7N6FyOX4aGYwTyQ-Po=.e460f371-831f-48af-8082-8f635a0143a9@github.com> On Tue, 18 Mar 2025 11:25:19 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Previously in https://github.com/openjdk/jdk/pull/23844, `loadConH` is implemented only with `flh`, but `fli_h` should be more efficient if Zfa is supported; min/max for HF could use fmin/maxm.h instead too. >> >> Thanks! > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - merge master > - fix fli_h > - fix rFlagsReg > - refactor min/max > - min/max > - initial commit Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24081#pullrequestreview-2705585693 From rehn at openjdk.org Fri Mar 21 11:49:23 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 21 Mar 2025 11:49:23 GMT Subject: RFR: 8352159: RISC-V: add more zfa support [v4] In-Reply-To: References: Message-ID: <6PXAPmYoI1wt8T0MKwgcMTMpaxw5-73FYtQJlp2YY8U=.ebaa1cb3-2496-4160-a517-80fa0038c16c@github.com> On Tue, 18 Mar 2025 11:25:19 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Previously in https://github.com/openjdk/jdk/pull/23844, `loadConH` is implemented only with `flh`, but `fli_h` should be more efficient if Zfa is supported; min/max for HF could use fmin/maxm.h instead too. >> >> Thanks! > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - merge master > - fix fli_h > - fix rFlagsReg > - refactor min/max > - min/max > - initial commit Yes, good, thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24081#pullrequestreview-2705656103 From rehn at openjdk.org Fri Mar 21 11:54:16 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 21 Mar 2025 11:54:16 GMT Subject: RFR: 8352423: RISC-V: simplify DivI/L ModI/L [v2] In-Reply-To: References: <3oq1Fgqt-St5FpdVKDixDPfiI4dHLdMosA9dyhupwpA=.c782c45f-dab1-4bb6-a3d9-8d7ae6c56a8c@github.com> Message-ID: On Wed, 19 Mar 2025 16:13:32 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Currently, implementation of DivI/L and ModI/L are overcomplicated, could and should be simplified. >> And, also enable some DivI/L and ModI/L related tests. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > copyright Nice removing 100 LOC, thank you! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24119#pullrequestreview-2705679812 From jkern at openjdk.org Fri Mar 21 12:07:14 2025 From: jkern at openjdk.org (Joachim Kern) Date: Fri, 21 Mar 2025 12:07:14 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: On Wed, 5 Mar 2025 11:43:15 GMT, Joachim Kern wrote: >> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. >> >> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). >> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). >> >> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . >> >> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. >> >> Tests: >> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths >> - SAP reports all tests green (they had reported errors with the previous version) >> - Oracle Tests ongoing >> - GHAs green > > Hi Thomas, > mprotect supports System V shared memory, but only if running in an environment where the MPROTECT_SHM=ON environmental variable is defined, which is not the case in the jdk. So we can fairly say System V shared memory cannot be mprotected by us. > > The documentation says: > _The mprotect subroutine can only be used on shared memory regions backed with 4 KB or 64 KB pages;_ > So we can mprotect 64K pages and mmap supports 64K pages beginning with AIX 7.3 TL1. > With JDK-8334371 we favor the use of mmap 64K pages over System V shared memory if running on a system with AIX 7.3 TL1 or higher. But as long as we allow lower os versions the system V shared memory is still in place, and the mprotect restriction stays valid. > I haven't seen test errors with this new version. @JoKern65, @MBaesken: Are you aware of any problems? No, I'm not aware of any problems. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2743175041 From dholmes at openjdk.org Fri Mar 21 12:08:11 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 21 Mar 2025 12:08:11 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: <9RYKO6P037JZUM-Fp7LUnMWHPLdXt5IO3ujvTDliJyw=.674c2fec-ca60-4def-81bb-01d5515e3fe2@github.com> References: <9RYKO6P037JZUM-Fp7LUnMWHPLdXt5IO3ujvTDliJyw=.674c2fec-ca60-4def-81bb-01d5515e3fe2@github.com> Message-ID: <-fW_jCwEjRYZAWRm2COUK47yyOoLGU6GtoMPyETsVUA=.40df942c-9baf-4c45-93d8-973719fa5800@github.com> On Fri, 21 Mar 2025 08:10:27 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 744: >> >>> 742: // Emit the event anyway, but without details. >>> 743: event->set_monitorClass(nullptr); >>> 744: event->set_address(0); >> >> Shouldn't this be the default state of the event? > > I looked at it before doing the patch, and I don't think the event-specific fields are default-initialized. In fact, if you skip these, JFR code would assert: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/shade/trunks/jdk/build/linux-x86_64-server-fastdebug/hotspot/variant-server/gensrc/jfrfiles/jfrEventClasses.hpp:1181), pid=279828, tid=279849 > # assert(verify_field_bit(0)) failed: Attempting to write an uninitialized event field: _monitorClass That seems like a bug in the event constructor to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24121#discussion_r2007453034 From mli at openjdk.org Fri Mar 21 12:12:16 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 21 Mar 2025 12:12:16 GMT Subject: RFR: 8352423: RISC-V: simplify DivI/L ModI/L [v2] In-Reply-To: <1qPH9TBDeiNpE_k-65RglinfjS44xO_Jjv_M6OjSSAI=.566a1910-37c4-4ac1-9745-31378e0333a9@github.com> References: <3oq1Fgqt-St5FpdVKDixDPfiI4dHLdMosA9dyhupwpA=.c782c45f-dab1-4bb6-a3d9-8d7ae6c56a8c@github.com> <1qPH9TBDeiNpE_k-65RglinfjS44xO_Jjv_M6OjSSAI=.566a1910-37c4-4ac1-9745-31378e0333a9@github.com> Message-ID: On Fri, 21 Mar 2025 11:20:54 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> copyright > > Marked as reviewed by luhenry (Committer). Thank you @luhenry @robehn ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24119#issuecomment-2743182568 From mli at openjdk.org Fri Mar 21 12:12:17 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 21 Mar 2025 12:12:17 GMT Subject: Integrated: 8352423: RISC-V: simplify DivI/L ModI/L In-Reply-To: <3oq1Fgqt-St5FpdVKDixDPfiI4dHLdMosA9dyhupwpA=.c782c45f-dab1-4bb6-a3d9-8d7ae6c56a8c@github.com> References: <3oq1Fgqt-St5FpdVKDixDPfiI4dHLdMosA9dyhupwpA=.c782c45f-dab1-4bb6-a3d9-8d7ae6c56a8c@github.com> Message-ID: On Wed, 19 Mar 2025 15:44:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Currently, implementation of DivI/L and ModI/L are overcomplicated, could and should be simplified. > And, also enable some DivI/L and ModI/L related tests. > > Thanks! This pull request has now been integrated. Changeset: ac760dd1 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/ac760dd106d88129f3c13520754f594b1d317a11 Stats: 173 lines in 6 files changed: 20 ins; 133 del; 20 mod 8352423: RISC-V: simplify DivI/L ModI/L Reviewed-by: fyang, luhenry, rehn ------------- PR: https://git.openjdk.org/jdk/pull/24119 From mli at openjdk.org Fri Mar 21 12:13:12 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 21 Mar 2025 12:13:12 GMT Subject: RFR: 8352159: RISC-V: add more zfa support [v4] In-Reply-To: <_0U03lXbC9t9SBOMmF-XjD9_O7N6FyOX4aGYwTyQ-Po=.e460f371-831f-48af-8082-8f635a0143a9@github.com> References: <_0U03lXbC9t9SBOMmF-XjD9_O7N6FyOX4aGYwTyQ-Po=.e460f371-831f-48af-8082-8f635a0143a9@github.com> Message-ID: On Fri, 21 Mar 2025 11:22:59 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - merge master >> - fix fli_h >> - fix rFlagsReg >> - refactor min/max >> - min/max >> - initial commit > > Marked as reviewed by luhenry (Committer). Thank you @luhenry @robehn ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24081#issuecomment-2743184915 From mli at openjdk.org Fri Mar 21 12:13:12 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 21 Mar 2025 12:13:12 GMT Subject: Integrated: 8352159: RISC-V: add more zfa support In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 14:42:38 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Previously in https://github.com/openjdk/jdk/pull/23844, `loadConH` is implemented only with `flh`, but `fli_h` should be more efficient if Zfa is supported; min/max for HF could use fmin/maxm.h instead too. > > Thanks! This pull request has now been integrated. Changeset: 04eac0c3 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/04eac0c3e2ce1a37d0661de10907228e0ca48aab Stats: 123 lines in 4 files changed: 93 ins; 9 del; 21 mod 8352159: RISC-V: add more zfa support Reviewed-by: fyang, luhenry, rehn ------------- PR: https://git.openjdk.org/jdk/pull/24081 From mbaesken at openjdk.org Fri Mar 21 12:28:07 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 21 Mar 2025 12:28:07 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: On Fri, 21 Mar 2025 12:03:48 GMT, Joachim Kern wrote: > I haven't seen test errors with this new version. @JoKern65, @MBaesken: Are you aware of any problems? I am not aware of issues related to this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2743220350 From shade at openjdk.org Fri Mar 21 12:46:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 21 Mar 2025 12:46:07 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: <-fW_jCwEjRYZAWRm2COUK47yyOoLGU6GtoMPyETsVUA=.40df942c-9baf-4c45-93d8-973719fa5800@github.com> References: <9RYKO6P037JZUM-Fp7LUnMWHPLdXt5IO3ujvTDliJyw=.674c2fec-ca60-4def-81bb-01d5515e3fe2@github.com> <-fW_jCwEjRYZAWRm2COUK47yyOoLGU6GtoMPyETsVUA=.40df942c-9baf-4c45-93d8-973719fa5800@github.com> Message-ID: On Fri, 21 Mar 2025 12:05:46 GMT, David Holmes wrote: >> I looked at it before doing the patch, and I don't think the event-specific fields are default-initialized. In fact, if you skip these, JFR code would assert: >> >> >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/home/shade/trunks/jdk/build/linux-x86_64-server-fastdebug/hotspot/variant-server/gensrc/jfrfiles/jfrEventClasses.hpp:1181), pid=279828, tid=279849 >> # assert(verify_field_bit(0)) failed: Attempting to write an uninitialized event field: _monitorClass > > That seems like a bug in the event constructor to me. Well, the asserts make it look this is not accidental: do the writes once with the actual values, maybe? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24121#discussion_r2007503794 From thartmann at openjdk.org Fri Mar 21 12:59:09 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 21 Mar 2025 12:59:09 GMT Subject: RFR: 8350892: [JVMCI] Align ResolvedJavaType.getInstanceFields with Class.getDeclaredFields In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 23:46:54 GMT, Doug Simon wrote: > The current order of fields returned by `ResolvedJavaType.getInstanceFields` is a) not well specified and b) different than the order of fields used almost everywhere else in HotSpot. This PR aligns the order of `getInstanceFields` with `Class.getDeclaredFields()`. > > It also makes `ciInstanceKlass::_nonstatic_fields` use the same order which unifies how escape analysis and deoptimization treats fields across C2 and JVMCI. Nice cleanup, CI changes look good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23849#pullrequestreview-2705854715 From dnsimon at openjdk.org Fri Mar 21 13:03:17 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 21 Mar 2025 13:03:17 GMT Subject: RFR: 8350892: [JVMCI] Align ResolvedJavaType.getInstanceFields with Class.getDeclaredFields In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 23:46:54 GMT, Doug Simon wrote: > The current order of fields returned by `ResolvedJavaType.getInstanceFields` is a) not well specified and b) different than the order of fields used almost everywhere else in HotSpot. This PR aligns the order of `getInstanceFields` with `Class.getDeclaredFields()`. > > It also makes `ciInstanceKlass::_nonstatic_fields` use the same order which unifies how escape analysis and deoptimization treats fields across C2 and JVMCI. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23849#issuecomment-2743295318 From dnsimon at openjdk.org Fri Mar 21 13:03:17 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 21 Mar 2025 13:03:17 GMT Subject: Integrated: 8350892: [JVMCI] Align ResolvedJavaType.getInstanceFields with Class.getDeclaredFields In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 23:46:54 GMT, Doug Simon wrote: > The current order of fields returned by `ResolvedJavaType.getInstanceFields` is a) not well specified and b) different than the order of fields used almost everywhere else in HotSpot. This PR aligns the order of `getInstanceFields` with `Class.getDeclaredFields()`. > > It also makes `ciInstanceKlass::_nonstatic_fields` use the same order which unifies how escape analysis and deoptimization treats fields across C2 and JVMCI. This pull request has now been integrated. Changeset: 0cb110eb Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/0cb110ebb7f8d184dd855f64c5dd7924c8202b3d Stats: 89 lines in 6 files changed: 18 ins; 32 del; 39 mod 8350892: [JVMCI] Align ResolvedJavaType.getInstanceFields with Class.getDeclaredFields Reviewed-by: yzheng, never, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/23849 From jsjolen at openjdk.org Fri Mar 21 13:22:10 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 21 Mar 2025 13:22:10 GMT Subject: RFR: 8352393: AIX: Problem list serviceability/attach/AttachAPIv2/StreamingOutputTest.java In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 14:56:48 GMT, Varada M wrote: > Excluding the test serviceability/attach/AttachAPIv2/StreamingOutputTest.java > > JBS Issue : [JDK-8352393](https://bugs.openjdk.org/browse/JDK-8352393) Marked as reviewed by jsjolen (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24116#pullrequestreview-2705924172 From jsjolen at openjdk.org Fri Mar 21 13:33:23 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 21 Mar 2025 13:33:23 GMT Subject: RFR: 8352415: x86: Tighten up template interpreter method entry code In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 13:44:40 GMT, Aleksey Shipilev wrote: > Interpreter performance is the still important for faster startup, since it would carry application until compilers kick in. After looking at Leyden scenarios in Xint mode, I believe incremental improvements are possible in template interpreter to make it faster. > > One of those improvements is tightening up method entry code. Profiling shows the hottest path in the whole ordeal for non-native methods is resolving the Java mirror to store the GC root for currently executing Method*. It involves 4-5 chained memory accesses, which incurs significant latency. > > We can massage the code to reuse some memory accesses and also spread them out to allow more latency-hiding hardware mechanisms to kick in. > > Additional testing: > - [x] Ad-hoc `-Xint` benchmarks > - [x] Linux x86_64 server fastdebug, `all` LGTM ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24114#pullrequestreview-2705955380 From mbaesken at openjdk.org Fri Mar 21 13:39:55 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 21 Mar 2025 13:39:55 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp [v2] In-Reply-To: References: Message-ID: > There are a few divisions by zero in sharedRuntimeTrans.cpp, used to "construct" NaN and -infinity. This should probably be replaced by using functionality from std::numeric_limits . Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: replace more divisons by 0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24136/files - new: https://git.openjdk.org/jdk/pull/24136/files/3e27521c..1aa41c47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24136&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24136&range=00-01 Stats: 18 lines in 1 file changed: 6 ins; 7 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24136/head:pull/24136 PR: https://git.openjdk.org/jdk/pull/24136 From mbaesken at openjdk.org Fri Mar 21 13:39:55 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 21 Mar 2025 13:39:55 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp In-Reply-To: References: Message-ID: <0NkshU4WEYPClR103vNao5kp6vWIHy1vb3CmY4pqzLI=.2aa384e3-976d-4898-bd05-21807fbe1e1a@github.com> On Thu, 20 Mar 2025 15:56:10 GMT, Matthias Baesken wrote: > There are a few divisions by zero in sharedRuntimeTrans.cpp, used to "construct" NaN and -infinity. This should probably be replaced by using functionality from std::numeric_limits . Btw there are also some old comments on SPARC in the file, should I remove those too? Seems the define `share/utilities/globalDefinitions_gcc.hpp:84:#define CAN_USE_NAN_DEFINE 1` is unused now,, should I remove it ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24136#issuecomment-2743380705 PR Comment: https://git.openjdk.org/jdk/pull/24136#issuecomment-2743389744 From mbaesken at openjdk.org Fri Mar 21 13:39:55 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 21 Mar 2025 13:39:55 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 19:54:45 GMT, Kim Barrett wrote: > For example, here's an unprotected one, where z may be zero: https://github.com/openjdk/jdk/blame/56038fb5a156568cce2e80f5db18b10ad61c06e4/src/hotspot/share/runtime/sharedRuntimeTrans.cpp#L519 > Probably zero handling should be it's own clause. Looks like this is shown by jtreg test java/lang/Math/PowTests on macOS aarch64 /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-macos_aarch64-opt/jdk/src/hotspot/share/runtime/sharedRuntimeTrans.cpp:517:23: runtime error: division by zero #0 0x105599454 in SharedRuntime::dpow(double, double) sharedRuntimeTrans.cpp:668 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24136#issuecomment-2743377830 From adinn at openjdk.org Fri Mar 21 14:02:17 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Fri, 21 Mar 2025 14:02:17 GMT Subject: RFR: 8349721: Add aarch64 intrinsics for ML-KEM [v4] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 22:04:26 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Fixed mismerge. > - Merged master. > - A little cleanup > - Merged master > - removing trailing spaces > - kyber aarch64 intrinsics src/hotspot/share/opto/library_call.cpp line 7800: > 7798: const char *stubName; > 7799: assert(UseKyberIntrinsics, "need Kyber intrinsics support"); > 7800: assert(callee()->signature()->size() == 3, "kyber12To16 has 3 parameters"); Just as an aside this causes testing of a debug build to fail. The intrinsic has 4 parameters. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2007638886 From adinn at openjdk.org Fri Mar 21 14:02:18 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Fri, 21 Mar 2025 14:02:18 GMT Subject: RFR: 8349721: Add aarch64 intrinsics for ML-KEM [v4] In-Reply-To: References: Message-ID: <54ED2n9rhYXWQuwge7bPuvPXtAmL2WpfRJFfXH__r2I=.dead1c37-4283-48a6-ad01-26fc92be30fa@github.com> On Fri, 21 Mar 2025 13:59:10 GMT, Andrew Dinn wrote: >> Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Fixed mismerge. >> - Merged master. >> - A little cleanup >> - Merged master >> - removing trailing spaces >> - kyber aarch64 intrinsics > > src/hotspot/share/opto/library_call.cpp line 7800: > >> 7798: const char *stubName; >> 7799: assert(UseKyberIntrinsics, "need Kyber intrinsics support"); >> 7800: assert(callee()->signature()->size() == 3, "kyber12To16 has 3 parameters"); > > Just as an aside this causes testing of a debug build to fail. The intrinsic has 4 parameters. With this value reset to 4 the ML_DSA test passes for ML_KEM on a debug build. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2007642721 From tschatzl at openjdk.org Fri Mar 21 14:20:34 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 21 Mar 2025 14:20:34 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v27] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits: - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq - * make young gen length revising independent of refinement thread * use a service task * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update - * fix IR code generation tests that change due to barrier cost changes - * factor out card table and refinement table merging into a single method - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 - * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. - * more documentation on why we need to rendezvous the gc threads - Merge branch 'master' into 8342381-card-table-instead-of-dcq - * ayang review * re-add STS leaver for java thread handshake - * when aborting refinement during full collection, the global card table and the per-thread card table might not be in sync. Roll forward during abort of the refinement in these situations. * additional verification * added some missing ResourceMarks in asserts * added variant of ArrayJuggle2 that crashes fairly quickly without these changes - ... and 25 more: https://git.openjdk.org/jdk/compare/0cb110eb...d9311047 ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=26 Stats: 7089 lines in 110 files changed: 2610 ins; 3555 del; 924 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From shade at openjdk.org Fri Mar 21 14:41:08 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 21 Mar 2025 14:41:08 GMT Subject: RFR: 8352415: x86: Tighten up template interpreter method entry code In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 13:44:40 GMT, Aleksey Shipilev wrote: > Interpreter performance is the still important for faster startup, since it would carry application until compilers kick in. After looking at Leyden scenarios in Xint mode, I believe incremental improvements are possible in template interpreter to make it faster. > > One of those improvements is tightening up method entry code. Profiling shows the hottest path in the whole ordeal for non-native methods is resolving the Java mirror to store the GC root for currently executing Method*. It involves 4-5 chained memory accesses, which incurs significant latency. > > We can massage the code to reuse some memory accesses and also spread them out to allow more latency-hiding hardware mechanisms to kick in. > > Additional testing: > - [x] Ad-hoc `-Xint` benchmarks > - [x] Linux x86_64 server fastdebug, `all` Thanks for reviews! @coleenp, you are likely interested in this area :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24114#issuecomment-2743562402 From cnorrbin at openjdk.org Fri Mar 21 15:45:45 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Fri, 21 Mar 2025 15:45:45 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream Message-ID: Hi everyone, This PR removes redundant `ResourceMark` instances where `LogStream` is used. Previously, `LogStream` inherited from `ResourceObj`, which required a `ResourceMark`, but this is no longer the case, making these instances unnecessary. Process: 1. I added assertions to check for resource unwinding in places where `ResourceMark`s were used with `LogStream`s. 2. Ran tests up to tier7 to confirm no unwinding was happening. This helped filter out cases where `ResourceMark`s were still required for other reasons. 3. Manually verified the remaining cases by tracing function calls to ensure the `ResourceMark`s were truly unnecessary. 4. Removed the redundant `ResourceMark` instances. ------------- Commit messages: - remove resourcemark extra assert - removed logstream resourcemarks Changes: https://git.openjdk.org/jdk/pull/24162/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24162&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8294954 Stats: 40 lines in 26 files changed: 1 ins; 39 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24162.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24162/head:pull/24162 PR: https://git.openjdk.org/jdk/pull/24162 From mcimadamore at openjdk.org Fri Mar 21 15:53:31 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 21 Mar 2025 15:53:31 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 01:25:24 GMT, John R Rose wrote: > Hi again Per! > > Here are some brief notes from our face-to-face chat at JavaOne. > > Debuggers want/need a "hook" for tentative evaluation of stables. It is an error for a debugger to trigger stable value decisions. This applies mainly to stable lists because of `toString`. > > Just how "mutable" is a stable list? How "eager to decide"? Which methods (if any) are tentative: `toString` / `equals` / `hashCode` ? Currently in the PR, all are decisive. This might be a case of the ?wrong default?. IMHO if we claim that what the API constructs is a `List`, it would be weird for these methods to behave any different. > > Maybe refactor composites to expose systematically "tenative" access API: > > * Less universal: SV.list(My::compute) => List > > * More universal; SV.stableList(My::compute) => List > > > BTW, it?s easy to understand a stable-list as a list of stables. But let?s be sure to leave room for a more compact data structure. A compact stable-list is a list of stable views into a backing array. The backing array looks like `@Stable private T[] resolvedValues`. Not `private final List> stableValues`. I don't disagree -- there's two list factories, one that returns a plain List and one that returns a List>. The important thing is we leave room for both (which means naming is important). But I think we're ok for this round even if we don't provide the second factory given that it can be "emulated" (although in a not as compact fashion -- but is that a problem?) In other words, I'd like to base some of these decisions more on concrete use cases. We had plenty use cases for List -- very few for List>. Maybe some real world use will show that, indeed a List> factory belongs here -- in which case, sure let's add one. tl;dr; let's get the naming right now -- but add the API later. > > For the record: I think this is sufficient for correctness: Use `getAcquire` (resp. `releaseSet`) for all stable reads (resp. writes. Do the `releaseSet` inside a mutex that serializes computation. Add a re-entrancy check in the mutex and throw on vicious cycles. > > I do NOT think `volatile` is necessary; it has too many fences. It is a safe default for a naked variable. But the stable variables are encapsulated, and do not need aggressive fences. As I said, `volatile` might not be necessary but it does make the implementation easier to validate (I think). We use `@Stable` + `volatile` in a number of places: * https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Module.java#L273 * https://github.com/openjdk/jdk/blob/master/src/jdk.unsupported/share/classes/sun/misc/Unsafe.java#L1762 In all these cases there is a fast path: e.g. when we know we have already warned for enable native access, or for Unsafe. In the SV API, the fast path is when we know that the SV is set already. In my experience, the volatile access in this fast path costs nothing: whenever I looked at the generated C2 code for hot paths of FFM code using enable-native-access, it seems that, once the stable field is set, the fact that it is `volatile` no longer matters. There's no barrier generated by C2 -- access is as fast as plain access. So, avoiding `volatile` can buy something in the slow paths -- imperative set, maybe for predicates. But how important is that (given stable values are only set once) ? For this reason, at least in my mind, I'd rather opt for an implementation that is easier to follow (even 10 months from now) -- of course, assuming it's fast enough in the fast path (which seems to be the case here) -- than having an uber-optimized implementation whose quirks we'll have to re-learn every time we touch the code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2743779395 From pminborg at openjdk.org Fri Mar 21 16:16:25 2025 From: pminborg at openjdk.org (Per Minborg) Date: Fri, 21 Mar 2025 16:16:25 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Sun, 16 Mar 2025 07:51:55 GMT, Alan Bateman wrote: > I'm surprised to see `@ForceInline` in the offset query functions in `Unsafe`. Those are not on any fast path I'm aware of. What use case does this annotation address? If none, consider deleting; it will be a future maintenance puzzle. Or at least document in a comment why a slow path function needs such an annotation. Yeah, it seems a bit odd. If we want to change this, we should do that under a separate issue. The new `ensureNotTrusted()` method carries `@ForceInline` because it is called from methods that also have the annotation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2743837331 From pminborg at openjdk.org Fri Mar 21 16:22:23 2025 From: pminborg at openjdk.org (Per Minborg) Date: Fri, 21 Mar 2025 16:22:23 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Sun, 16 Mar 2025 00:34:45 GMT, John R Rose wrote: > Comments on visual noise and side effects in `toString`. > > `renderWrapped` is clever for a single stable value, but it makes for a very noisy display string, with confusing doubly-nested `[]`, for composite stable values. I'm talking about `StableFunction` mainly, I guess. > > I suggest omitting the inner `[]` for such composites. A simple boolean on `renderWrapped` will do that trick. In that case, `renderWrapped` has the job of either presenting a fixed (recognizable) sentinel string, or else forwards, without further editorial comment, to the `toString` of the contained value. The `toString()` for `StableValue` is inspired by `Optional` which works in the same way by adding `[ ]` around the contents. Any more thought in the reviewer community on how we should handle this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2743849237 From qamai at openjdk.org Fri Mar 21 17:29:33 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 21 Mar 2025 17:29:33 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 15:50:05 GMT, Maurizio Cimadamore wrote: > In all these cases there is a fast path: e.g. when we know we have already warned for enable native access, or for Unsafe. In the SV API, the fast path is when we know that the SV is set already. In my experience, the volatile access in this fast path costs nothing: whenever I looked at the generated C2 code for hot paths of FFM code using enable-native-access, it seems that, once the stable field is set, the fact that it is volatile no longer matters. There's no barrier generated by C2 -- access is as fast as plain access. An acquire load is allowed to be reordered with a preceding volatile store and I believe this is the only case where it makes a difference. E.g:: x = load_acquire(p); store_volatile(p, v); y = load_acquire(p); can be transformed into: x = load_acquire(p); y = x; store_volatile(p, v); Furthermore, on Aarch64, volatile load is implemented with `ldar` while acquire load can be implemented with `ldr`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2744005070 From iklam at openjdk.org Fri Mar 21 18:03:48 2025 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 21 Mar 2025 18:03:48 GMT Subject: RFR: 8352579: Refactor CDS legacy optimization for lambda proxy classes [v2] In-Reply-To: References: Message-ID: > Since JDK 16, CDS has provided limited optimization for lambda expressions. This has been superseded by JEP 483 and is useful only when `-XX:+AOTClassLinking` is not enabled (which is the case for the default CDS archive, for compatibility reasons). > > The "legacy lambda optimization" may eventually be removed. For the time being, we should consolidate the code into a single source code and clearly mark its uses. This way we can avoid confusion with the JEP 483 code for supporting lambdas (and other java.lang.invoke functionalities). Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Fixed github action build failures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24145/files - new: https://git.openjdk.org/jdk/pull/24145/files/06648904..c8cfc851 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24145&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24145&range=00-01 Stats: 21 lines in 3 files changed: 11 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24145.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24145/head:pull/24145 PR: https://git.openjdk.org/jdk/pull/24145 From iklam at openjdk.org Fri Mar 21 18:39:31 2025 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 21 Mar 2025 18:39:31 GMT Subject: RFR: 8352579: Refactor CDS legacy optimization for lambda proxy classes [v3] In-Reply-To: References: Message-ID: > Since JDK 16, CDS has provided limited optimization for lambda expressions. This has been superseded by JEP 483 and is useful only when `-XX:+AOTClassLinking` is not enabled (which is the case for the default CDS archive, for compatibility reasons). > > The "legacy lambda optimization" may eventually be removed. For the time being, we should consolidate the code into a single source code and clearly mark its uses. This way we can avoid confusion with the JEP 483 code for supporting lambdas (and other java.lang.invoke functionalities). Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Fixed infinite recursion compiler warning ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24145/files - new: https://git.openjdk.org/jdk/pull/24145/files/c8cfc851..87b34cee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24145&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24145&range=01-02 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24145.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24145/head:pull/24145 PR: https://git.openjdk.org/jdk/pull/24145 From adinn at openjdk.org Fri Mar 21 20:29:26 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Fri, 21 Mar 2025 20:29:26 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: <1cr1G_U4ivPScWUPGINekngI0AI9MoVN9L_w3LJaY-g=.c821e0c5-26f2-4ba9-87fb-4a316f617b86@github.com> On Fri, 21 Mar 2025 17:26:08 GMT, Quan Anh Mai wrote: >>> Hi again Per! >>> >>> Here are some brief notes from our face-to-face chat at JavaOne. >>> >>> Debuggers want/need a "hook" for tentative evaluation of stables. It is an error for a debugger to trigger stable value decisions. This applies mainly to stable lists because of `toString`. >>> >>> Just how "mutable" is a stable list? How "eager to decide"? Which methods (if any) are tentative: `toString` / `equals` / `hashCode` ? Currently in the PR, all are decisive. This might be a case of the ?wrong default?. >> >> IMHO if we claim that what the API constructs is a `List`, it would be weird for these methods to behave any different. >> >>> >>> Maybe refactor composites to expose systematically "tenative" access API: >>> >>> * Less universal: SV.list(My::compute) => List >>> >>> * More universal; SV.stableList(My::compute) => List >>> >>> >>> BTW, it?s easy to understand a stable-list as a list of stables. But let?s be sure to leave room for a more compact data structure. A compact stable-list is a list of stable views into a backing array. The backing array looks like `@Stable private T[] resolvedValues`. Not `private final List> stableValues`. >> >> I don't disagree -- there's two list factories, one that returns a plain List and one that returns a List>. The important thing is we leave room for both (which means naming is important). But I think we're ok for this round even if we don't provide the second factory given that it can be "emulated" (although in a not as compact fashion -- but is that a problem?) >> >> In other words, I'd like to base some of these decisions more on concrete use cases. We had plenty use cases for List -- very few for List>. Maybe some real world use will show that, indeed a List> factory belongs here -- in which case, sure let's add one. >> >> tl;dr; let's get the naming right now -- but add the API later. >> >>> >>> For the record: I think this is sufficient for correctness: Use `getAcquire` (resp. `releaseSet`) for all stable reads (resp. writes. Do the `releaseSet` inside a mutex that serializes computation. Add a re-entrancy check in the mutex and throw on vicious cycles. >>> >>> I do NOT think `volatile` is necessary; it has too many fences. It is a safe default for a naked variable. But the stable variables are encapsulated, and do not need aggressive fences. >> >> As I said, `volatile` might not be necessary but it does make the implementation easier to validate (I think)... > >> In all these cases there is a fast path: e.g. when we know we have already warned for enable native access, or for Unsafe. In the SV API, the fast path is when we know that the SV is set already. In my experience, the volatile access in this fast path costs nothing: whenever I looked at the generated C2 code for hot paths of FFM code using enable-native-access, it seems that, once the stable field is set, the fact that it is volatile no longer matters. There's no barrier generated by C2 -- access is as fast as plain access. > > An acquire load is allowed to be reordered with a preceding volatile store and I believe this is the only case where it makes a difference. E.g:: > > x = load_acquire(p); > store_volatile(p, v); > y = load_acquire(p); > > can be transformed into: > > x = load_acquire(p); > y = x; > store_volatile(p, v); > > Furthermore, on Aarch64, volatile load is implemented with `ldar` while acquire load can be implemented with `ldr`. @merykitty > Furthermore, on Aarch64, volatile load is implemented with ldar while acquire load can be implemented with ldr. I'm not sure exactly what you mean here but I don't think that sounds right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2744379545 From iklam at openjdk.org Fri Mar 21 23:47:46 2025 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 21 Mar 2025 23:47:46 GMT Subject: RFR: 8352579: Refactor CDS legacy optimization for lambda proxy classes [v4] In-Reply-To: References: Message-ID: > Since JDK 16, CDS has provided limited optimization for lambda expressions. This has been superseded by JEP 483 and is useful only when `-XX:+AOTClassLinking` is not enabled (which is the case for the default CDS archive, for compatibility reasons). > > The "legacy lambda optimization" may eventually be removed. For the time being, we should consolidate the code into a single source code and clearly mark its uses. This way we can avoid confusion with the JEP 483 code for supporting lambdas (and other java.lang.invoke functionalities). Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @matias9927 offline comments - consolidated two functions with identical names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24145/files - new: https://git.openjdk.org/jdk/pull/24145/files/87b34cee..bd642f8e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24145&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24145&range=02-03 Stats: 76 lines in 5 files changed: 19 ins; 41 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/24145.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24145/head:pull/24145 PR: https://git.openjdk.org/jdk/pull/24145 From qamai at openjdk.org Sat Mar 22 02:44:21 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 22 Mar 2025 02:44:21 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: <1cr1G_U4ivPScWUPGINekngI0AI9MoVN9L_w3LJaY-g=.c821e0c5-26f2-4ba9-87fb-4a316f617b86@github.com> References: <1cr1G_U4ivPScWUPGINekngI0AI9MoVN9L_w3LJaY-g=.c821e0c5-26f2-4ba9-87fb-4a316f617b86@github.com> Message-ID: On Fri, 21 Mar 2025 20:25:51 GMT, Andrew Dinn wrote: >>> In all these cases there is a fast path: e.g. when we know we have already warned for enable native access, or for Unsafe. In the SV API, the fast path is when we know that the SV is set already. In my experience, the volatile access in this fast path costs nothing: whenever I looked at the generated C2 code for hot paths of FFM code using enable-native-access, it seems that, once the stable field is set, the fact that it is volatile no longer matters. There's no barrier generated by C2 -- access is as fast as plain access. >> >> An acquire load is allowed to be reordered with a preceding volatile store and I believe this is the only case where it makes a difference. E.g:: >> >> x = load_acquire(p); >> store_volatile(p, v); >> y = load_acquire(p); >> >> can be transformed into: >> >> x = load_acquire(p); >> y = x; >> store_volatile(p, v); >> >> Furthermore, on Aarch64, volatile load is implemented with `ldar` while acquire load can be implemented with `ldapr`. > > @merykitty >> Furthermore, on Aarch64, volatile load is implemented with ldar while acquire load can be implemented with ldr. > > I'm not sure exactly what you mean here but I don't think that sounds right? @adinn Oh yes silly me, what I meant was that acquire load can be implemented using **LDAPR**. Edited in the original comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2744913159 From duke at openjdk.org Sat Mar 22 20:02:31 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Sat, 22 Mar 2025 20:02:31 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with two additional commits since the last revision: - Further readability improvements. - Added asserts for array sizes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23860/files - new: https://git.openjdk.org/jdk/pull/23860/files/e9db09e2..56656894 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=09-10 Stats: 228 lines in 2 files changed: 72 ins; 56 del; 100 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From vpaprotski at openjdk.org Sat Mar 22 20:05:11 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Sat, 22 Mar 2025 20:05:11 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v10] In-Reply-To: <2N5Evij0f6qZi_pG3tqoz11aQbSnLG0YszqHR9ROfKI=.d44b16c6-d334-42c4-8de8-92eb41229248@github.com> References: <2N5Evij0f6qZi_pG3tqoz11aQbSnLG0YszqHR9ROfKI=.d44b16c6-d334-42c4-8de8-92eb41229248@github.com> Message-ID: <2yP2P1VNWgQu6cWvn0_a_7LdidS71C6PWKcqGKTOHnc=.49f8ac0f-df23-4f1e-adb9-e03a3f2295b2@github.com> On Thu, 20 Mar 2025 20:37:25 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Fix windows build was going to finish the rest of the functions.. but I see you pushed an update so I better rebase! here are the pending comments I had that perhaps are no longer applicable.. (working through the ntt math..) src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 121: > 119: static void montmulEven(int outputReg, int inputReg1, int inputReg2, > 120: int scratchReg1, int scratchReg2, > 121: int parCnt, MacroAssembler *_masm) { nitpick.. this could be made to look more like `montMul64()` by also taking in an array of registers. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 160: > 158: for (int i = 0; i < 4; i++) { > 159: __ vpmuldq(xmm(scratchRegs[i]), xmm(inputRegs1[i]), xmm(inputRegs2[i]), > 160: Assembler::AVX_512bit); using an array of registers, instead of array of ints would read somewhat more compact and fewer 'indirections' . i.e. static void montMul64(XMMRegister outputRegs*, XMMRegister inputRegs1*, XMMRegister inputRegs2*, ... __ vpmuldq(scratchRegs[i], inputRegs1[i], inputRegs2[i], Assembler::AVX_512bit); src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 216: > 214: // Zmm8-Zmm23 used as scratch registers > 215: // result goes to Zmm0-Zmm7 > 216: static void montMulByConst128(MacroAssembler *_masm) { wish the inputs and output register arrays were explicit.. easier to follow that way src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 230: > 228: } > 229: > 230: static void sub_add(int subResult[], int addResult[], Big fan of all these helper functions! Makes reading the top level functions way easier, thanks for refactoring! src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 279: > 277: static int xmm4_20_24[] = {4, 5, 6, 7, 20, 21, 22, 23, 24, 25, 26, 27}; > 278: static int xmm16_27[] = {16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27}; > 279: static int xmm29_29[] = {29, 29, 29, 29}; I very much like the new refactor, waaaay clearer now. Some 'Could Do' comments.. - I probably would have preferred 'even more symbolic' variable names (i.e. its ideal when you can match the java variable names!). Conversely, if 'forced to defend this style', these names are MUCH much easier to debug from GDB, its clear what the matching instruction is. - Not sure about it being global. It works currently, but less 'future proof'. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 645: > 643: // poly1 (int[256]) = c_rarg1 > 644: // poly2 (int[256]) = c_rarg2 > 645: static address generate_dilithiumNttMult_avx512(StubGenerator *stubgen, This would be 'nice to have', something 'lost' with the refactor.. As I was reviewing this (original) function, I was thinking, "there is nothing here _that_ specific to AVX512, mostly columnar&independent operations... This function could be made 'vector-length-independent'..." - double the loop length: int iter = vector_len==Assembler::AVX_512bit?4:8; __ movl(len, 4); -> __ movl(len, iter); - halve the register arrays.. (or keep them the same but shuffle them to make SURE the first half are in xmm0-xmm15 range) XMMRegister POLY1[] = {xmm0, xmm1, xmm12, xmm13}; XMMRegister POLY2[] = {xmm4, xmm5, xmm16, xmm17}; XMMRegister SCRATCH1[] = {xmm2, xmm3, xmm14, xmm15}; <<< here XMMRegister SCRATCH2[] = {xmm6, xmm7, xmm18, xmm19}; <<< and here XMMRegister SCRATCH3[] = {xmm8, xmm9, xmm10, xmm11}; - couple of other int constants (like the memory 'step' and such) - for assembler calls, like `evmovdqul` and `evpsubd`, need a few small new MacroAssembler helpers to instead generate VEX encoded versions (plenty of instructions already do that). - I think only the perm instruction was unique to evex (didnt really think of an alternative for AVX2.. but can be abstracted away with another helper) Anyway; not suggesting its something you do here.. but it would be convenient to leave breadcrumbs/hooks for a future update so one of us can revisit this code and add AVX2 support. e.g. `parCnt` variable was very convenient before for exactly this, now its gone... it probably could be derived in each function from vector_len but..; Its now cleaner, but also harder to 'upgrade'? Why AVX2? many of the newer (Atom/Ecore-based/EnableX86ECoreOpts) processors do not have AVX512 support, so its something I've been prioritizing recently The alternative would be to write a completely separate AVX2 implementation, but that would be a shame, not to 'just' reuse this code. ? "For fun", I had even gone and parametrized the mult function with the `vector_len` to see how it would look (almost identical... to the original version): static void montmulEven2(XMMRegister* outputReg, XMMRegister* inputReg1, XMMRegister* inputReg2, XMMRegister* scratchReg1, XMMRegister* scratchReg2, XMMRegister montQInvModR, XMMRegister dilithium_q, int parCnt, int vector_len, MacroAssembler* _masm) { for (int i = 0; i < parCnt; i++) { // scratch1 = (int64)input1_even*input2_even // Java: long a = (long) b * (long) c; __ vpmuldq(scratchReg1[i], inputReg1[i], inputReg2[i], vector_len); } for (int i = 0; i < parCnt; i++) { // scratch2 = int32(montQInvModR*(int32)scratch1) // Java: int aLow = (int) a; // Java: int m = MONT_Q_INV_MOD_R * aLow; // signed low product __ vpmulld(scratchReg2[i], scratchReg1[i], montQInvModR, vector_len); } for (int i = 0; i < parCnt; i++) { // scratch2 = (int64)scratch2_even*dilithium_q_even // Java: ((long)m * MONT_Q) __ vpmuldq(scratchReg2[i], scratchReg2[i], dilithium_q, vector_len); } for (int i = 0; i < parCnt; i++) { // output_odd = scratch1_odd - scratch2_odd // Java: (aHigh - (int) (("scratch2") >> MONT_R_BITS)) __ vpsubd(outputReg[i], scratchReg1[i], scratchReg2[i], vector_len); } } ------------- PR Review: https://git.openjdk.org/jdk/pull/23860#pullrequestreview-2708079853 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008809855 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008811046 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008811541 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008811704 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008808110 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008824304 From vpaprotski at openjdk.org Sat Mar 22 20:05:12 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Sat, 22 Mar 2025 20:05:12 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v7] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 21:06:30 GMT, Ferenc Rakoczi wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 58: >> >>> 56: >>> 57: ATTRIBUTE_ALIGNED(64) static const uint32_t dilithiumAvx512Perms[] = { >>> 58: // collect montmul results into the destination register >> >> same as `dilithiumAvx512Consts()`, 'magic offsets'; except here they are harder to count (eg. not clear visually what is the offset of `ntt inverse`). >> >> Could be split into three constant arrays to make the compiler count for us > > Well, it is 64 bytes per line (16 4-byte uint32_ts), not that hard :-) ... Ha! I didn't realize it was 16 per line.. ran out of fingers while counting!!! :) 'works for me, as long as its a "premeditated" decision' >> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 980: >> >>> 978: // Dilithium multiply polynomials in the NTT domain. >>> 979: // Implements >>> 980: // static int implDilithiumNttMult( >> >> I suppose no java changes in this PR, but I notice that the inputs are all assumed to have fixed size. >> >> Most/all intrinsics I worked with had some sort of guard (eg `Objects.checkFromIndexSize`) right before the intrinsic java call. (It usually looks like it can be optimized away). But I notice no such guard here on the java side. > > These functions will not be used anywhere else and in ML_DSA.java all of the arrays passed to inrinsics are of the correct size. Works for me; just thought I would point it out, so its a 'premeditated' decision. >> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1012: >> >>> 1010: __ evmovdqul(xmm28, Address(perms, 0), Assembler::AVX_512bit); >>> 1011: >>> 1012: __ movl(len, 4); >> >> Compile-time constant, why not 'unroll at compile time'? i.e. wrap this loop with `for (int len=0; len<4; len++)` instead? > > I have found that unrolling these loops actually hurts performance (probably an I-cache effect. Interesting; I keep on having to re-train my intuition, thanks for the data ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008806159 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008805574 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008805113 From duke at openjdk.org Sat Mar 22 20:23:25 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Sat, 22 Mar 2025 20:23:25 GMT Subject: RFR: 8349721: Add aarch64 intrinsics for ML-KEM [v5] In-Reply-To: References: Message-ID: > By using the aarch64 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Fixed bad assertion. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23663/files - new: https://git.openjdk.org/jdk/pull/23663/files/7e9b3d84..9ec9a6cd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23663&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23663&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23663.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23663/head:pull/23663 PR: https://git.openjdk.org/jdk/pull/23663 From vpaprotski at openjdk.org Sat Mar 22 20:42:09 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Sat, 22 Mar 2025 20:42:09 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11] In-Reply-To: References: Message-ID: On Sat, 22 Mar 2025 20:02:31 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with two additional commits since the last revision: > > - Further readability improvements. > - Added asserts for array sizes src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 119: > 117: static address dilithiumAvx512PermsAddr() { > 118: return (address) dilithiumAvx512Perms; > 119: } Hear me out.. ... enums!! enum nttPermOffset { montMulPermsIdx = 0, nttL4PermsIdx = 64, nttL5PermsIdx = 192, nttL6PermsIdx = 320, nttL7PermsIdx = 448, nttInvL0PermsIdx = 704, nttInvL1PermsIdx = 832, nttInvL2PermsIdx = 960, nttInvL3PermsIdx = 1088, nttInvL4PermsIdx = 1216, }; static address dilithiumAvx512PermsAddr(nttPermOffset offset) { return (address) dilithiumAvx512Perms + offset; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008900858 From jbhateja at openjdk.org Mon Mar 24 02:41:14 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 24 Mar 2025 02:41:14 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11] In-Reply-To: References: Message-ID: On Sat, 22 Mar 2025 20:02:31 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with two additional commits since the last revision: > > - Further readability improvements. > - Added asserts for array sizes src/hotspot/cpu/x86/vm_version_x86.cpp line 1252: > 1250: // Currently we only have them for AVX512 > 1251: #ifdef _LP64 > 1252: if (supports_evex() && supports_avx512bw()) { supports_evex check looks redundant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2009379308 From kbarrett at openjdk.org Mon Mar 24 05:41:10 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 24 Mar 2025 05:41:10 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp [v2] In-Reply-To: References: Message-ID: <3icGBXOIOkienB-88jrracWAUoFiZP79AZt2RYjTvy4=.e62d6c71-b0a8-4be5-9941-afc3f24f21fb@github.com> On Fri, 21 Mar 2025 13:39:55 GMT, Matthias Baesken wrote: >> There are a few divisions by zero in sharedRuntimeTrans.cpp, used to "construct" NaN and -infinity. This should probably be replaced by using functionality from std::numeric_limits . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > replace more divisons by 0 src/hotspot/share/runtime/sharedRuntimeTrans.cpp line 518: > 516: z = ax; /*x is +-0,+-inf,+-1*/ > 517: if(hy<0) { > 518: if (z == 0.0) { Maybe s/z == 0.0/ix == 0/ ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24136#discussion_r2009490209 From kbarrett at openjdk.org Mon Mar 24 05:52:07 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 24 Mar 2025 05:52:07 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp [v2] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 15:58:08 GMT, Joe Darcy wrote: > FYI, the code in question looks to be HotSpot-internal copies of FDLIBM code or similar. A coding pattern used in FDLIBM and other math libraries is to have an expression evaluate to a non-finite value (NaN or an infinity) rather than directly returning a NaN or infinity so that the IEEE 754 sticky flag state can be set, such as divide by zero or invalid. > > The Java floating-point model, both at the JVM and language level, excludes sticky flags so preserving the sticky flag side-effects is not necessary. The (conditional) use of `NAN` seems like it already broke that pattern a long time ago. Fortunately, we don't care. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24136#issuecomment-2746962873 From dholmes at openjdk.org Mon Mar 24 06:31:06 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 24 Mar 2025 06:31:06 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 15:38:21 GMT, Casper Norrbin wrote: > Hi everyone, > > This PR removes redundant `ResourceMark` instances where `LogStream` is used. Previously, `LogStream` inherited from `ResourceObj`, which required a `ResourceMark`, but this is no longer the case, making these instances unnecessary. > > Process: > 1. I added assertions to check for resource unwinding in places where `ResourceMark`s were used with `LogStream`s. > 2. Ran tests up to tier7 to confirm no unwinding was happening. This helped filter out cases where `ResourceMark`s were still required for other reasons. > 3. Manually verified the remaining cases by tracing function calls to ensure the `ResourceMark`s were truly unnecessary. > 4. Removed the redundant `ResourceMark` instances. Did you determine that the deleted RM's were put in to be used by the LogStream rather than the things being printed to the LogStream? It is quite difficult to be sure you have exercised all of the logging code that was modified. It is quite likely many of these log outputs are not actually being tested anywhere (and very difficult to verify one way or another). Have you tested by enabling all logging in some simple tests on all platforms? (Of course that is nowhere near sufficient in terms of coverage as you would need to test with numerous permutations of VM features.) Thanks ------------- PR Review: https://git.openjdk.org/jdk/pull/24162#pullrequestreview-2709241848 From dholmes at openjdk.org Mon Mar 24 07:09:22 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 24 Mar 2025 07:09:22 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: References: <9RYKO6P037JZUM-Fp7LUnMWHPLdXt5IO3ujvTDliJyw=.674c2fec-ca60-4def-81bb-01d5515e3fe2@github.com> <-fW_jCwEjRYZAWRm2COUK47yyOoLGU6GtoMPyETsVUA=.40df942c-9baf-4c45-93d8-973719fa5800@github.com> Message-ID: On Fri, 21 Mar 2025 12:42:43 GMT, Aleksey Shipilev wrote: >> That seems like a bug in the event constructor to me. > > Well, I don't think this is accidental. I think the intent is to do the writes once with the actual values? Anyhow, this is what JFR generates from event metadata, and we should be following suit. Okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24121#discussion_r2009568810 From dholmes at openjdk.org Mon Mar 24 07:18:07 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 24 Mar 2025 07:18:07 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 07:06:25 GMT, Aleksey Shipilev wrote: >> Little regression crept in with [JDK-8351142](https://bugs.openjdk.org/browse/JDK-8351142): on the deflation path, object associated with monitor can be already dead. >> >> A new stress test fails within seconds without a fix. It also covers other monitor events, so we have extra coverage there as well. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, new stress test now passes >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Emit the event anyway Looks fine to me. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24121#pullrequestreview-2709323402 From rehn at openjdk.org Mon Mar 24 08:36:57 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 24 Mar 2025 08:36:57 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v4] In-Reply-To: References: Message-ID: <3offilavnorMFRTRJK8oCgc4VkWQ-tbqka-HbqvcLjs=.9e76929a-4b7f-4c63-b662-f548fa3f9ec0@github.com> > Hi please consider. > > Added case to turn off UseZvfh when no RVV. > Which is the cause of the test issues, zvfh on but no rvv. > > Also made all case identical and added no warning when default. > Move them to the common init, as the "UseExtension" is not C2 specific. > > Manual tested and some random compiler tests. > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - dep check - Merge branch 'master' into maxvector_0 - Merge branch 'master' into maxvector_0 - Merge branch 'master' into maxvector_0 - hwprobe deps - Merge branch 'master' into maxvector_0 - Moved to common - Disable UseZvfh when no RVV ------------- Changes: https://git.openjdk.org/jdk/pull/24094/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24094&range=03 Stats: 106 lines in 2 files changed: 45 ins; 19 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/24094.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24094/head:pull/24094 PR: https://git.openjdk.org/jdk/pull/24094 From rehn at openjdk.org Mon Mar 24 08:36:57 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 24 Mar 2025 08:36:57 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v2] In-Reply-To: <5o_6LiKmaWr9PMLlcK8qmrgH5YNMO1B2tKBU25g4AFY=.e088bf1e-4c08-4264-a03d-133753452184@github.com> References: <5o_6LiKmaWr9PMLlcK8qmrgH5YNMO1B2tKBU25g4AFY=.e088bf1e-4c08-4264-a03d-133753452184@github.com> Message-ID: On Thu, 20 Mar 2025 16:03:34 GMT, Hamlin Li wrote: >> You can check that in compile time with: >> `STATIC_ASSERT(offsetof(VM_Version, ext_ZvXX) > offsetof(VM_Version, ext_V));` > > That's also fine. Added normal assert, as it was much easier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24094#discussion_r2009702923 From alanb at openjdk.org Mon Mar 24 08:50:21 2025 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 24 Mar 2025 08:50:21 GMT Subject: RFR: 8352437: Support --add-exports with -XX:+AOTClassLinking In-Reply-To: References: Message-ID: <9c6Oa_uABRWWm_JLcMWcJnVgBBi1sFaB54oHvQIR0po=.a4fe2243-db7a-40cc-9195-2b2cc0e49ed1@github.com> On Thu, 20 Mar 2025 04:46:21 GMT, Ioi Lam wrote: > `-XX:+AOTClassLinking` requires the CDS archived full module graph (FMG). > > - Before this PR, when `--add-export` is specified, FMG is disabled, so AOT caches created with `-XX:+AOTClassLinking` cannot be loaded. > - After this PR, if the exact same `--add-export` flags as specified across the training/assembly/production phases, the FMG can be used, so we can use so AOT caches created with `-XX:+AOTClassLinking`. > > The change itself is straight-forward: just remember the `--add-export` flags specified during AOT cache creation, and check the exact same ones are used during the production run. > > I did a fair amount of refactoring to change the "exact options specified" checks in modules.cpp, so more such options can be easily added in the future (we need to handle `--add-reads` and `--add-opens` in future RFEs). > > (Note: this PR depends on #24122 ) Is the motivation tests or code that is making use of JDK internals? No objection to the change of course, I'm curious why we are doing this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24124#issuecomment-2747329517 From shade at openjdk.org Mon Mar 24 09:28:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Mar 2025 09:28:16 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 07:06:25 GMT, Aleksey Shipilev wrote: >> Little regression crept in with [JDK-8351142](https://bugs.openjdk.org/browse/JDK-8351142): on the deflation path, object associated with monitor can be already dead. >> >> A new stress test fails within seconds without a fix. It also covers other monitor events, so we have extra coverage there as well. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, new stress test now passes >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Emit the event anyway @egahlin, @mgronlun -- you folks are good with this version? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24121#issuecomment-2747436126 From azafari at openjdk.org Mon Mar 24 09:57:47 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 24 Mar 2025 09:57:47 GMT Subject: RFR: 8352140: UBSAN: fix the left shift of negative value in klass.hpp, array_layout_helper() Message-ID: The `array_layout_helper()` with `jint tag` as its first arg, is called with a `tag` whose sign-bit is always set and considered as negative. This negative value is UB in left-shift operation. Changing the type to `juint` fixes this. Tests: linux-x64-debug tier1 with UBSAN enabled. ------------- Commit messages: - 8352140: UBSAN: fix the left shift of negative value in klass.hpp, array_layout_helper() Changes: https://git.openjdk.org/jdk/pull/24184/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24184&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352140 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24184/head:pull/24184 PR: https://git.openjdk.org/jdk/pull/24184 From mgronlun at openjdk.org Mon Mar 24 11:20:12 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 24 Mar 2025 11:20:12 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 07:06:25 GMT, Aleksey Shipilev wrote: >> Little regression crept in with [JDK-8351142](https://bugs.openjdk.org/browse/JDK-8351142): on the deflation path, object associated with monitor can be already dead. >> >> A new stress test fails within seconds without a fix. It also covers other monitor events, so we have extra coverage there as well. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, new stress test now passes >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Emit the event anyway A event about a deflated monitor without any information which monitor is deflated seems just like noise to me. What problem are we trying to solve here again? Are we interested in how long a monitor was inflated? Pairing up inflates with deflates? Implementation wise, we should use monotonic, internally assigned ids as keys for monitor identity, instead of relying on the oops (but that is outside this PR). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24121#issuecomment-2747769648 From mdoerr at openjdk.org Mon Mar 24 11:38:08 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 24 Mar 2025 11:38:08 GMT Subject: RFR: 8352393: AIX: Problem list serviceability/attach/AttachAPIv2/StreamingOutputTest.java In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 14:56:48 GMT, Varada M wrote: > Excluding the test serviceability/attach/AttachAPIv2/StreamingOutputTest.java > > JBS Issue : [JDK-8352393](https://bugs.openjdk.org/browse/JDK-8352393) LGTM. Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24116#pullrequestreview-2710063524 From shade at openjdk.org Mon Mar 24 11:42:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Mar 2025 11:42:10 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: References: Message-ID: <7KqJIB191MZrfT07GwZOhoac0OsszqcyStwZ7tnQMMI=.174417ec-182a-4656-8037-94af03807dba@github.com> On Mon, 24 Mar 2025 11:15:28 GMT, Markus Gr?nlund wrote: > What problem are we trying to solve here again? Are we interested in how long a monitor was inflated? Pairing up inflates with deflates? Yes, we want to pair inflates with deflates. When deflate happens on a dead object, we don't have a clear signal which object was deflated. But the event counts (and their timestamps) would still be matchable, and `inflations` - `deflations` would be roughly equal to monitor count from the stats event. I expect the events with dead objects would be fairly rare anyway. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24121#issuecomment-2747830193 From mgronlun at openjdk.org Mon Mar 24 11:54:15 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 24 Mar 2025 11:54:15 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 07:06:25 GMT, Aleksey Shipilev wrote: >> Little regression crept in with [JDK-8351142](https://bugs.openjdk.org/browse/JDK-8351142): on the deflation path, object associated with monitor can be already dead. >> >> A new stress test fails within seconds without a fix. It also covers other monitor events, so we have extra coverage there as well. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, new stress test now passes >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Emit the event anyway How do you match up the timestamps? Because without the oop, the connection between the two events is lost. Are we measuring how long time it takes to deflate a monitor? How useful is that information? Deflations can also happen asynchronously. If so, the thread information cannot be used to pair them up either. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24121#issuecomment-2747859838 From shade at openjdk.org Mon Mar 24 12:25:11 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Mar 2025 12:25:11 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 11:51:14 GMT, Markus Gr?nlund wrote: > How do you match up the timestamps? Because without the oop, the connection between the two events is lost. True, but even the timestamp+event is a useful bit of info. If there are X monitors recorded by stats event at 13:13, then 100+ inflations happened at 13:14, and then 50+ deflations happened at 13:15, then I can plausibly guess the momentary monitor population is X+50, even if deflation events gives me no precise mapping for dead objects was deflated. > Are we measuring how long time it takes to deflate a monitor? How useful is that information? It is useful to know _when_ deflations happened, as this shows if deflater thread is actually performing well. We have seen "memory leaks" due to deflation policy bugs when monitor deflater was essentially stuck / outpaced by inflations. Pretty sure there are still lingering issues when monitor population spikes in a very short time frame, so it is useful to know inflations/deflations at individual events scale. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24121#issuecomment-2747947463 From mbaesken at openjdk.org Mon Mar 24 13:03:38 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 24 Mar 2025 13:03:38 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp [v3] In-Reply-To: References: Message-ID: <71lvVSO4t3f6vpo-WjT9biJPALKr_DhKmwD33Egv8cM=.c0230c8a-cae5-4aa1-8d63-3408f16ac221@github.com> > There are a few divisions by zero in sharedRuntimeTrans.cpp, used to "construct" NaN and -infinity. This should probably be replaced by using functionality from std::numeric_limits . Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: java/lang/Math/PowTests.java failed, we have to return infinity ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24136/files - new: https://git.openjdk.org/jdk/pull/24136/files/1aa41c47..263f771f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24136&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24136&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24136/head:pull/24136 PR: https://git.openjdk.org/jdk/pull/24136 From mbaesken at openjdk.org Mon Mar 24 13:08:48 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 24 Mar 2025 13:08:48 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp [v4] In-Reply-To: References: Message-ID: > There are a few divisions by zero in sharedRuntimeTrans.cpp, used to "construct" NaN and -infinity. This should probably be replaced by using functionality from std::numeric_limits . Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: remove CAN_USE_NAN_DEFINE ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24136/files - new: https://git.openjdk.org/jdk/pull/24136/files/263f771f..2b82fe21 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24136&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24136&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24136/head:pull/24136 PR: https://git.openjdk.org/jdk/pull/24136 From mbaesken at openjdk.org Mon Mar 24 13:08:48 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 24 Mar 2025 13:08:48 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp [v2] In-Reply-To: <3icGBXOIOkienB-88jrracWAUoFiZP79AZt2RYjTvy4=.e62d6c71-b0a8-4be5-9941-afc3f24f21fb@github.com> References: <3icGBXOIOkienB-88jrracWAUoFiZP79AZt2RYjTvy4=.e62d6c71-b0a8-4be5-9941-afc3f24f21fb@github.com> Message-ID: On Mon, 24 Mar 2025 05:37:24 GMT, Kim Barrett wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> replace more divisons by 0 > > src/hotspot/share/runtime/sharedRuntimeTrans.cpp line 518: > >> 516: z = ax; /*x is +-0,+-inf,+-1*/ >> 517: if(hy<0) { >> 518: if (z == 0.0) { > > Maybe s/z == 0.0/ix == 0/ ? Hi Kim, why ix and not x ? Is it a typo? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24136#discussion_r2010143274 From duke at openjdk.org Mon Mar 24 13:30:13 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Mon, 24 Mar 2025 13:30:13 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: <18kBo65jThBxZNtrIEGRezseLL5PkayvEIqzDmHS1Do=.84aba917-2fc0-4463-b96d-12d99a653ec9@github.com> References: <18kBo65jThBxZNtrIEGRezseLL5PkayvEIqzDmHS1Do=.84aba917-2fc0-4463-b96d-12d99a653ec9@github.com> Message-ID: On Mon, 24 Mar 2025 13:25:48 GMT, Ashutosh Mehra wrote: > Sorry, this missed my radar. I can pick it up this week if it still needs another pair of eyes. Thanks @ashu-mehra, that would help. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2748136179 From asmehra at openjdk.org Mon Mar 24 13:30:13 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 24 Mar 2025 13:30:13 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: <18kBo65jThBxZNtrIEGRezseLL5PkayvEIqzDmHS1Do=.84aba917-2fc0-4463-b96d-12d99a653ec9@github.com> On Wed, 5 Mar 2025 17:45:26 GMT, Thomas Fitzsimmons wrote: >> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. >> >> I tested it with: >> >> >> java -Xlog:os+container=trace -version >> >> on: >> >> `Red Hat Enterprise Linux 8 (cgroups v1 only)`: >> _No change in behaviour_ >> >> `Fedora 41 (cgroups v2)`: >> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ >> >> --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 >> +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 >> @@ -1,7 +1,12 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> +[debug][os,container] v2 controller cpuset is enabled and relevant >> +[debug][os,container] v2 controller cpu is enabled and required >> +[debug][os,container] v2 controller io is enabled but not relevant >> +[debug][os,container] v2 controller memory is enabled and required >> +[debug][os,container] v2 controller hugetlb is enabled but not relevant >> +[debug][os,container] v2 controller pids is enabled and relevant >> +[debug][os,container] v2 controller rdma is enabled but not relevant >> +[debug][os,container] v2 controller misc is enabled but not relevant >> [debug][os,container] Detected cgroups v2 unified hierarchy >> [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope >> [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max >> >> >> `Fedora 41 (custom kernel with cgroups v1 disabled)`: >> _Fixes `cgroups v2` detection:_ >> >> --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 >> +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 >> @@ -1,7 +1,63 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> -[debug][os,container] controller memory is not enabled >> - ] >> -[debug][os,container] One or more required controllers disabled at kernel level. >> +[... > > Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 > - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent > - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing > > Remove from cgroups v1 branch incorrect log messages about cpuset > controller being optional. Add test case for cgroups v1, cpuset > disabled. > - Improve !cgroups_v2_enabled branch comment > - Debug-log optional and disabled cgroups v2 controllers > > Do not log enabled controllers that are not relevant to the JDK. > - Move index declaration to scope in which it is used > - Remove empty string check during cgroup.controllers parsing > - Define ISSPACE_CHARS macro, use it in strsep call > - Pass fgets result to strsep > - Replace is_cgroupsV2 with cgroups_v2_enabled > > Also fix the testCgroupv1SystemdOnly and testCgroupv1NoMounts test > cases such that their /proc/cgroups and /proc/self/cgroup contents > correspond. This prevents assertion failures these tests were > producing when is_cgroupsV2 was replaced with cgroups_v2_enabled. > - ... and 3 more: https://git.openjdk.org/jdk/compare/0a601f4b...b6926e15 Sorry, this missed my radar. I can pick it up this week if it still needs another pair of eyes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2748130355 From duke at openjdk.org Mon Mar 24 15:10:28 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Mon, 24 Mar 2025 15:10:28 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock Message-ID: ### Summary: This PR makes memory operations atomic with NMT accounting. ### The problem: In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. 1.1 Thread_1 releases range_A. 1.2 Thread_1 tells NMT "range_A has been released". 2.1 Thread_2 reserves (the now free) range_A. 2.2 Thread_2 tells NMT "range_A is reserved". Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. ### Solution: Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. ### Other notes: I also simplified this pattern found in many places: if (MemTracker::enabled()) { MemTracker::NmtVirtualMemoryLocker nvml; result = pd_some_operation(addr, bytes); if (result != nullptr) { MemTracker::record_some_operation(addr, bytes); } } else { result = pd_unmap_memory(addr, bytes); } ``` To: MemTracker::NmtVirtualMemoryLocker nvml; result = pd_unmap_memory(addr, bytes); MemTracker::record_some_operation(addr, bytes); ``` This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific implementation. In many places I've done minor refactoring by relocating calls to `MemTracker` in order to tighten the locking scope. In some OS specific code (such as `os::map_memory_to_file`), I've replaced calls to `os::release_memory` with `os::pd_release_memory`. This is to avoid `NmtVirtualMemoryLocker` reentrancy. In a few places (such as `VirtualMemoryTracker::add_reserved_region`) I have replaced `tty` with `defaultStream::output_stream()`. Otherwise `NmtVirtualMemory_lock` would be acquired out of rank order with `tty_lock`. ### Testing: One concern, due to the expanded critical section, is reentrancy. `NmtVirtualMemoryLocker` is a HotSpot mutex and is not reentrant. I've added new tests in _test_os.cpp_ and _test_virtualMemoryTracker.cpp_ that try to exercise any usages of NMT that weren't already exercised by existing tests. tier1 passes on linux and windows. I do not have an AIX machine to test on. Can someone please help run the tests on AIX? ------------- Commit messages: - make memory op and NMT accounting atomic Changes: https://git.openjdk.org/jdk/pull/24084/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24084&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341491 Stats: 291 lines in 12 files changed: 186 ins; 42 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/24084.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24084/head:pull/24084 PR: https://git.openjdk.org/jdk/pull/24084 From stuefe at openjdk.org Mon Mar 24 15:14:21 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Mar 2025 15:14:21 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: Message-ID: <2QUUuzqu_F0eii8j8FqbFlBI-04SN8u6MT2jmOoPsYo=.88646953-2d89-40b8-9e24-fb870b154a8b@github.com> On Mon, 17 Mar 2025 16:20:42 GMT, Robert Toyonaga wrote: > ### Summary: > This PR makes memory operations atomic with NMT accounting. > > ### The problem: > In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. > > 1.1 Thread_1 releases range_A. > 1.2 Thread_1 tells NMT "range_A has been released". > > 2.1 Thread_2 reserves (the now free) range_A. > 2.2 Thread_2 tells NMT "range_A is reserved". > > Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. > > ### Solution: > Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. > > ### Other notes: > I also simplified this pattern found in many places: > > if (MemTracker::enabled()) { > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_some_operation(addr, bytes); > if (result != nullptr) { > MemTracker::record_some_operation(addr, bytes); > } > } else { > result = pd_unmap_memory(addr, bytes); > } > ``` > To: > > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_unmap_memory(addr, bytes); > MemTracker::record_some_operation(addr, bytes); > ``` > This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. > > I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific implementation. > > In many places I've done minor refactoring by relocating call... Ping @JoKern65 for AIX ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2748477402 From vpaprotski at openjdk.org Mon Mar 24 15:19:22 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 24 Mar 2025 15:19:22 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11] In-Reply-To: References: Message-ID: <_TOBoO4cMQpw4sgzIpNpQZ2w5wDgezKQZLe314DQ7zo=.813b81bf-ecc0-4f75-a0d6-fbb13dde594e@github.com> On Sat, 22 Mar 2025 20:02:31 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with two additional commits since the last revision: > > - Further readability improvements. > - Added asserts for array sizes I still need to have a look at the sha3 changes, but I think I am done with the most complex part of the review. This was a really interesting bit of code to review! src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 270: > 268: } > 269: > 270: static void loadPerm(int destinationRegs[], Register perms, `replXmm`? i.e. this function is replicating (any) Xmm register, not just perm?.. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 327: > 325: // > 326: // > 327: static address generate_dilithiumAlmostNtt_avx512(StubGenerator *stubgen, Similar comments as to `generate_dilithiumAlmostInverseNtt_avx512` - similar comment about the 'pair-wise' operation, updating `[j]` and `[j+l]` at a time.. - somehow had less trouble following the flow through registers here, perhaps I am getting used to it. FYI, ended renaming some as: // xmm16_27 = Temp1 // xmm0_3 = Coeffs1 // xmm4_7 = Coeffs2 // xmm8_11 = Coeffs3 // xmm12_15 = Coeffs4 = Temp2 // xmm16_27 = Scratch src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 421: > 419: for (int i = 0; i < 8; i += 2) { > 420: __ evpermi2d(xmm(i / 2 + 12), xmm(i), xmm(i + 1), Assembler::AVX_512bit); > 421: } Wish there was a more 'abstract' way to arrange this, so its obvious from the shape of the code what registers are input/outputs (i.e. and use the register arrays). Even though its just 'elementary index operations' `i/2 + 16` is still 'clever'. Couldnt think of anything myself though (same elsewhere in this function for the table permutes). src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 509: > 507: // coeffs (int[256]) = c_rarg0 > 508: // zetas (int[256]) = c_rarg1 > 509: static address generate_dilithiumAlmostInverseNtt_avx512(StubGenerator *stubgen, Done with this function; Perhaps the 'permute table' is a common vector-algorithm pattern, but this is really clever! Some general comments first, rest inline. - The array names for registers helped a lot. And so did the new helper functions! - The java version of this code is quite intimidating to vectorize.. 3D loop, with geometric iteration variables.. and the literature is even more intimidating (discrete convolutions which I havent touched in two decades, ffts, ntts, etc.) Here is my attempt at a comment to 'un-scare' the next reader, though feel free to reword however you like. The core of the (Java) loop is this 'pair-wise' operation: int a = coeffs[j]; int b = coeffs[j + offset]; coeffs[j] = (a + b); coeffs[j + offset] = montMul(a - b, -MONT_ZETAS_FOR_NTT[m]); There are 8 'levels' (0-7); ('levels' are equivalent to (unrolling) the outer (Java) loop) At each level, the 'pair-wise-offset' doubles (2^l: 1, 2, 4, 8, 16, 32, 64, 128). To vectorize this Java code, observe that at each level, REGARDLESS the offset, half the operations are the SUM, and the other half is the montgomery MULTIPLICATION (of the pair-difference with a constant). At each level, one 'just' has to shuffle the coefficients, so that SUMs and MULTIPLICATIONs line up accordingly. Otherwise, this pattern is 'lightly similar' to a discrete convolution (compute integral/summation of two functions at every offset) - I still would prefer (more) symbolic register names.. I wouldn't hold my approval over it so won't object if nobody else does, but register numbers are harder to 'see' through the flow. I ended up search/replacing/'annotating' to make it easier on myself to follow the flow of data: // xmm8_11 = Perms1 // xmm12_15 = Perms2 // xmm16_27 = Scratch // xmm0_3 = CoeffsPlus // xmm4_7 = CoeffsMul // xmm24_27 = CoeffsMinus (overlaps with Scratch) (I made a similar comment, but I think it is now hidden after the last refactor) - would prefer to see the helper functions to get ALL the registers passed explicitly (i.e. currently `montMulPerm`, `montQInvModR`, `dilithium_q`, `xmm29`, are implicit.). As a general rule, I've tried to set up all the registers up at the 'entry' function (`generate_dilithium*` in this case) and from there on, use symbolic names. Not always reasonable, but what I've grown used to see? Done with this function; Perhaps the 'permute table' is a common vector-algorithm pattern, but this is really clever! Some general comments first, rest inline. - The array names for registers helped a lot. And so did the new helper functions! - The java version of this code is quite intimidating to vectorize.. 3D loop, with geometric iteration variables.. and the literature is even more intimidating (discrete convolutions which I havent touched in two decades, ffts, ntts, etc.) Here is my attempt at a comment to 'un-scare' the next reader, though feel free to reword however you like. The core of the (Java) loop is this 'pair-wise' operation: int a = coeffs[j]; int b = coeffs[j + offset]; coeffs[j] = (a + b); coeffs[j + offset] = montMul(a - b, -MONT_ZETAS_FOR_NTT[m]); There are 8 'levels' (0-7); ('levels' are equivalent to (unrolling) the outer (Java) loop) At each level, the 'pair-wise-offset' doubles (2^l: 1, 2, 4, 8, 16, 32, 64, 128). To vectorize this Java code, observe that at each level, REGARDLESS the offset, half the operations are the SUM, and the other half is the montgomery MULTIPLICATION (of the pair-difference with a constant). At each level, one 'just' has to shuffle the coefficients, so that SUMs and MULTIPLICATIONs line up accordingly. Otherwise, this pattern is 'lightly similar' to a discrete convolution (compute integral/summation of two functions at every offset) - I still would prefer (more) symbolic register names.. I wouldn't hold my approval over it so won't object if nobody else does, but register numbers are harder to 'see' through the flow. I ended up search/replacing/'annotating' to make it easier on myself to follow the flow of data: // xmm8_11 = Perms1 // xmm12_15 = Perms2 // xmm16_27 = Scratch // xmm0_3 = CoeffsPlus // xmm4_7 = CoeffsMul // xmm24_27 = CoeffsMinus (overlaps with Scratch) (I made a similar comment, but I think it is now hidden after the last refactor) - would prefer to see the helper functions to get ALL the registers passed explicitly (i.e. currently `montMulPerm`, `montQInvModR`, `dilithium_q`, `xmm29`, are implicit.). As a general rule, I've tried to set up all the registers up at the 'entry' function (`generate_dilithium*` in this case) and from there on, use symbolic names. Not always reasonable, but what I've grown used to see? src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 554: > 552: for (int i = 0; i < 8; i += 2) { > 553: __ evpermi2d(xmm(i / 2 + 8), xmm(i), xmm(i + 1), Assembler::AVX_512bit); > 554: __ evpermi2d(xmm(i / 2 + 12), xmm(i), xmm(i + 1), Assembler::AVX_512bit); Took a bit to unscramble the flow, so a comment needed? Purpose 'fairly obvious' once I got the general shape of the level/algorithm (as per my top-level comment) but something like "shuffle xmm0-7 into xmm8-15"? src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 572: > 570: load4Xmms(xmm4_7, zetas, 512, _masm); > 571: sub_add(xmm24_27, xmm0_3, xmm8_11, xmm12_15, _masm); > 572: montMul64(xmm4_7, xmm24_27, xmm4_7, xmm16_27, _masm); >From my annotated version, levels 1-4, fairly 'straightforward': // level 1 replXmm(Perms1, perms, nttInvL1PermsIdx, _masm); replXmm(Perms2, perms, nttInvL1PermsIdx + 64, _masm); for (int i = 0; i < 4; i++) { __ evpermi2d(xmm(Perms1[i]), xmm(CoeffsPlus[i]), xmm(CoeffsMul[i]), Assembler::AVX_512bit); __ evpermi2d(xmm(Perms2[i]), xmm(CoeffsPlus[i]), xmm(CoeffsMul[i]), Assembler::AVX_512bit); } load4Xmms(CoeffsMul, zetas, 512, _masm); sub_add(CoeffsMinus, CoeffsPlus, Perms1, Perms2, _masm); montMul64(CoeffsMul, CoeffsMinus, CoeffsMul, Scratch, _masm); src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 613: > 611: montMul64(xmm4_7, xmm24_27, xmm4_7, xmm16_27, _masm); > 612: > 613: // level 5 "// No shuffling for level 5 and 6; can just rearrange full registers" src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 656: > 654: for (int i = 0; i < 8; i++) { > 655: __ evpsubd(xmm(i), k0, xmm(i + 8), xmm(i), false, Assembler::AVX_512bit); > 656: } Fairly clean as is, but could also be two sub_add calls, I think (you have to swap order of add/sub in the helper, to be able to clobber `xmm(i)`.. or swap register usage downstream, so perhaps not.. but would be cleaner) sub_add(CoeffsPlus, Scratch, Perms1, CoeffsPlus, _masm); sub_add(CoeffsMul, &Scratch[4], Perms2, CoeffsMul, _masm); If nothing else, would had prefered to see the use of the register array variables src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 660: > 658: store4Xmms(coeffs, 0, xmm16_19, _masm); > 659: store4Xmms(coeffs, 4 * XMMBYTES, xmm20_23, _masm); > 660: montMulByConst128(_masm); Would prefer explicit parameters here. But I think this could also be two `montMul64` calls? montMul64(xmm0_3, xmm0_3, xmm29_29, Scratch, _masm); montMul64(xmm4_7, xmm4_7, xmm29_29, Scratch, _masm); (I think there is one other use of `montMulByConst128` where same applies; then you could delete both `montMulByConst128` and `montmulEven` src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 871: > 869: __ evpaddd(xmm5, k0, xmm1, barrettAddend, false, Assembler::AVX_512bit); > 870: __ evpaddd(xmm6, k0, xmm2, barrettAddend, false, Assembler::AVX_512bit); > 871: __ evpaddd(xmm7, k0, xmm3, barrettAddend, false, Assembler::AVX_512bit); Fairly 'straightforward' transcription of the java code.. no comments from me. At first glance using `xmm0_3`, `xmm4_7`, etc. might had been a good idea, but you only save one line per 4x group. (Unless you have one big loop, but I suspect that give you worse performance? Is that something you tried already? Might be worth it otherwise..) src/java.base/share/classes/sun/security/provider/ML_DSA.java line 1418: > 1416: int twoGamma2, int multiplier) { > 1417: assert (input.length == ML_DSA_N) && (lowPart.length == ML_DSA_N) > 1418: && (highPart.length == ML_DSA_N); I wrote this test to test java-to-intrinsic correspondence. Might be good to include it (and add the other 4 intrinsics). This is very similar to all my other *Fuzz* tests I've been adding for my own intrinsics (and you made this test FAR easier to write by breaking out the java implementation; need to 'copy' that pattern myself) import java.util.Arrays; import java.util.Random; import java.lang.invoke.MethodHandle; import java.lang.invoke.MethodHandles; import java.lang.reflect.Field; import java.lang.reflect.Method; import java.lang.reflect.Constructor; public class ML_DSA_Intrinsic_Test { public static void main(String[] args) throws Exception { MethodHandles.Lookup lookup = MethodHandles.lookup(); Class kClazz = Class.forName("sun.security.provider.ML_DSA"); Constructor constructor = kClazz.getDeclaredConstructor( int.class); constructor.setAccessible(true); Method m = kClazz.getDeclaredMethod("mlDsaNttMultiply", int[].class, int[].class, int[].class); m.setAccessible(true); MethodHandle mult = lookup.unreflect(m); m = kClazz.getDeclaredMethod("implDilithiumNttMultJava", int[].class, int[].class, int[].class); m.setAccessible(true); MethodHandle multJava = lookup.unreflect(m); Random rnd = new Random(); long seed = rnd.nextLong(); rnd.setSeed(seed); //Note: it might be useful to increase this number during development of new intrinsics final int repeat = 1000000; int[] coeffs1 = new int[ML_DSA_N]; int[] coeffs2 = new int[ML_DSA_N]; int[] prod1 = new int[ML_DSA_N]; int[] prod2 = new int[ML_DSA_N]; try { for (int i = 0; i < repeat; i++) { run(prod1, prod2, coeffs1, coeffs2, mult, multJava, rnd, seed, i); } System.out.println("Fuzz Success"); } catch (Throwable e) { System.out.println("Fuzz Failed: " + e); } } private static final int ML_DSA_N = 256; public static void run(int[] prod1, int[] prod2, int[] coeffs1, int[] coeffs2, MethodHandle mult, MethodHandle multJava, Random rnd, long seed, int i) throws Exception, Throwable { for (int j = 0; j > 2.1 Thread_2 reserves (the now free) range_A. > 2.2 Thread_2 tells NMT "range_A is reserved". > > Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. > > ### Solution: > Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. > > ### Other notes: > I also simplified this pattern found in many places: > > if (MemTracker::enabled()) { > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_some_operation(addr, bytes); > if (result != nullptr) { > MemTracker::record_some_operation(addr, bytes); > } > } else { > result = pd_unmap_memory(addr, bytes); > } > ``` > To: > > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_unmap_memory(addr, bytes); > MemTracker::record_some_operation(addr, bytes); > ``` > This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. > > I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific implementation. > > In many places I've done minor refactoring by relocating call... I don't see why we need to extend the lock to be held over the reserve/commit/alloc part. If we only extend the lock on the release side, then it looks like we get the required exclusion: lock 1.1 Thread_1 releases range_A. 1.2 Thread_1 tells NMT "range_A has been released". unlock 2.1 Thread_2 reserves (the now free) range_A. lock 2.2 Thread_2 tells NMT "range_A is reserved". unlock We can get ordering (1.1) (2.1) (1.2) (2.2) but we can't get (1.1) (2.1) (2.2) (1.2). And isn't this locking scheme exactly what the current code is using? Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations. ------------- Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24084#pullrequestreview-2710963414 From matsaave at openjdk.org Mon Mar 24 16:39:13 2025 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 24 Mar 2025 16:39:13 GMT Subject: RFR: 8352579: Refactor CDS legacy optimization for lambda proxy classes [v4] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 23:47:46 GMT, Ioi Lam wrote: >> Since JDK 16, CDS has provided limited optimization for lambda expressions. This has been superseded by JEP 483 and is useful only when `-XX:+AOTClassLinking` is not enabled (which is the case for the default CDS archive, for compatibility reasons). >> >> The "legacy lambda optimization" may eventually be removed. For the time being, we should consolidate the code into a single source code and clearly mark its uses. This way we can avoid confusion with the JEP 483 code for supporting lambdas (and other java.lang.invoke functionalities). > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @matias9927 offline comments - consolidated two functions with identical names Thanks for addressing our offline conversation, looks good to me! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24145#pullrequestreview-2711025876 From stuefe at openjdk.org Mon Mar 24 17:03:21 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Mar 2025 17:03:21 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 16:16:34 GMT, Stefan Karlsson wrote: >> ### Summary: >> This PR makes memory operations atomic with NMT accounting. >> >> ### The problem: >> In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. >> >> 1.1 Thread_1 releases range_A. >> 1.2 Thread_1 tells NMT "range_A has been released". >> >> 2.1 Thread_2 reserves (the now free) range_A. >> 2.2 Thread_2 tells NMT "range_A is reserved". >> >> Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. >> >> ### Solution: >> Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. >> >> ### Other notes: >> I also simplified this pattern found in many places: >> >> if (MemTracker::enabled()) { >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_some_operation(addr, bytes); >> if (result != nullptr) { >> MemTracker::record_some_operation(addr, bytes); >> } >> } else { >> result = pd_unmap_memory(addr, bytes); >> } >> ``` >> To: >> >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_unmap_memory(addr, bytes); >> MemTracker::record_some_operation(addr, bytes); >> ``` >> This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. >> >> I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific i... > > I don't see why we need to extend the lock to be held over the reserve/commit/alloc part. > > If we only extend the lock on the release side, then it looks like we get the required exclusion: > > lock > 1.1 Thread_1 releases range_A. > 1.2 Thread_1 tells NMT "range_A has been released". > unlock > > 2.1 Thread_2 reserves (the now free) range_A. > lock > 2.2 Thread_2 tells NMT "range_A is reserved". > unlock > > We can get ordering (1.1) (2.1) (1.2) (2.2) but we can't get (1.1) (2.1) (2.2) (1.2). > > And isn't this locking scheme exactly what the current code is using? Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations. @stefank > And isn't this locking scheme exactly what the current code is using? Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations. What about the case where one thread reserves a range and another thread releases it? 1 Thread A reserves range 2 Thread B releases range 3 Thread B tells NMT "range released" 4 Thread A tells NMT "range reserved" This would either result in an assert in NMT at step 3 when releasing a range NMT does not know. Or in an incorrectly booked range in step 4 without asserts. Am I making a thinking error somewhere? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2748815516 From duke at openjdk.org Mon Mar 24 17:14:14 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Mon, 24 Mar 2025 17:14:14 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 16:16:34 GMT, Stefan Karlsson wrote: >> ### Summary: >> This PR makes memory operations atomic with NMT accounting. >> >> ### The problem: >> In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. >> >> 1.1 Thread_1 releases range_A. >> 1.2 Thread_1 tells NMT "range_A has been released". >> >> 2.1 Thread_2 reserves (the now free) range_A. >> 2.2 Thread_2 tells NMT "range_A is reserved". >> >> Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. >> >> ### Solution: >> Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. >> >> ### Other notes: >> I also simplified this pattern found in many places: >> >> if (MemTracker::enabled()) { >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_some_operation(addr, bytes); >> if (result != nullptr) { >> MemTracker::record_some_operation(addr, bytes); >> } >> } else { >> result = pd_unmap_memory(addr, bytes); >> } >> ``` >> To: >> >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_unmap_memory(addr, bytes); >> MemTracker::record_some_operation(addr, bytes); >> ``` >> This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. >> >> I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific i... > > I don't see why we need to extend the lock to be held over the reserve/commit/alloc part. > > If we only extend the lock on the release side, then it looks like we get the required exclusion: > > lock > 1.1 Thread_1 releases range_A. > 1.2 Thread_1 tells NMT "range_A has been released". > unlock > > 2.1 Thread_2 reserves (the now free) range_A. > lock > 2.2 Thread_2 tells NMT "range_A is reserved". > unlock > > We can get ordering (1.1) (2.1) (1.2) (2.2) but we can't get (1.1) (2.1) (2.2) (1.2). > > And isn't this locking scheme exactly what the current code is using? Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations. Hi @stefank, I think you're right about (1.1) (2.1) (2.2) (1.2) being prevented by the current implementation. Is there a reason that the current implementation only does the wider locking for release/uncommit? Maybe (2.1) (1.1) (1.2) (2.2) isn't much of an issue since it's unlikely that another thread would uncommit/release the same base address shortly after it's committed/reserved? >Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations. I haven't seen that race in the wild, I just noticed that the memory operations weren't protected and thought that it could be a problem. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2748848175 From vpaprotski at openjdk.org Mon Mar 24 17:23:51 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 24 Mar 2025 17:23:51 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v7] In-Reply-To: References: Message-ID: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. > > Before (no AVX512) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s > > After (with AVX2) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s > > > Before (with AVX512): > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 102... Volodymyr Paprotski has updated the pull request incrementally with two additional commits since the last revision: - whitespace - prettify test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23719/files - new: https://git.openjdk.org/jdk/pull/23719/files/56fd168d..a7f756af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=05-06 Stats: 38 lines in 1 file changed: 17 ins; 0 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/23719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23719/head:pull/23719 PR: https://git.openjdk.org/jdk/pull/23719 From vpaprotski at openjdk.org Mon Mar 24 17:29:08 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 24 Mar 2025 17:29:08 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4] In-Reply-To: <7RbzyVMGDjIExr2AfjOVElXXrKIlddltIo6vPH0yxQs=.7296744e-29f1-4e72-a44d-ce8875be6644@github.com> References: <7RbzyVMGDjIExr2AfjOVElXXrKIlddltIo6vPH0yxQs=.7296744e-29f1-4e72-a44d-ce8875be6644@github.com> Message-ID: On Thu, 20 Mar 2025 17:34:53 GMT, Anthony Scarpino wrote: >> I used it this testcase for development (and figured I should also check it in..) so what might be 'obvious' to me, might not be for anyone else? >> >> Typically, when a test failed, I grabbed the SEED from the test output, reran the test with that seed fixed and I went to the exception and printed the hex values of the inputs; (then debug from there. Typically, I would write another test, so I could GDB into the intrinsic, with just those input values). >> >> It was pretty much always the case always that once I got the inputs, I could reproduce the error i.e. not a type of bug that happens silently then discovered somewhere else. Luckily. All this crypto code is constant-time -no-branches-; so the 'test coverage' here is not 'all-branches-taken' but really 'did you remember to collect all the carries'. like 53-bit limb needs to be propagated back down to 52. Thats what the test here is 'searching' for, some input that could trip up computation. > > Can you add a comment to the test code about how you use the seed to reproduce any failures? So that in the future, someone who doesn't know will now have an idea how to start debugging this. (was having fun reviewing MLDSA, getting back to this one..) just added some comments and hopefully better test error messages. Let me know if that works @ascarpino ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r2010632985 From mgronlun at openjdk.org Mon Mar 24 19:49:17 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 24 Mar 2025 19:49:17 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 07:06:25 GMT, Aleksey Shipilev wrote: >> Little regression crept in with [JDK-8351142](https://bugs.openjdk.org/browse/JDK-8351142): on the deflation path, object associated with monitor can be already dead. >> >> A new stress test fails within seconds without a fix. It also covers other monitor events, so we have extra coverage there as well. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, new stress test now passes >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Emit the event anyway Marked as reviewed by mgronlun (Reviewer). Ok. We could devise a more stable scheme to retain the pairing in the future. ------------- PR Review: https://git.openjdk.org/jdk/pull/24121#pullrequestreview-2711520462 PR Comment: https://git.openjdk.org/jdk/pull/24121#issuecomment-2749227054 From kbarrett at openjdk.org Mon Mar 24 20:33:16 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 24 Mar 2025 20:33:16 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp [v2] In-Reply-To: References: <3icGBXOIOkienB-88jrracWAUoFiZP79AZt2RYjTvy4=.e62d6c71-b0a8-4be5-9941-afc3f24f21fb@github.com> Message-ID: On Mon, 24 Mar 2025 13:04:46 GMT, Matthias Baesken wrote: >> src/hotspot/share/runtime/sharedRuntimeTrans.cpp line 518: >> >>> 516: z = ax; /*x is +-0,+-inf,+-1*/ >>> 517: if(hy<0) { >>> 518: if (z == 0.0) { >> >> Maybe s/z == 0.0/ix == 0/ ? > > Hi Kim, why ix and not x ? Is it a typo? Not a typo, I really meant `ix`. The point is that testing the `int` typed `ix` for zero is no worse, and might be better than, testing a `double` typed value for zero. And I suggested `ix` rather than `hx` because `ix` has the sign bit stripped off. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24136#discussion_r2010883179 From fyang at openjdk.org Tue Mar 25 02:41:12 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 25 Mar 2025 02:41:12 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v4] In-Reply-To: <3offilavnorMFRTRJK8oCgc4VkWQ-tbqka-HbqvcLjs=.9e76929a-4b7f-4c63-b662-f548fa3f9ec0@github.com> References: <3offilavnorMFRTRJK8oCgc4VkWQ-tbqka-HbqvcLjs=.9e76929a-4b7f-4c63-b662-f548fa3f9ec0@github.com> Message-ID: On Mon, 24 Mar 2025 08:36:57 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> Added case to turn off UseZvfh when no RVV. >> Which is the cause of the test issues, zvfh on but no rvv. >> >> Also made all case identical and added no warning when default. >> Move them to the common init, as the "UseExtension" is not C2 specific. >> >> Manual tested and some random compiler tests. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - dep check > - Merge branch 'master' into maxvector_0 > - Merge branch 'master' into maxvector_0 > - Merge branch 'master' into maxvector_0 > - hwprobe deps > - Merge branch 'master' into maxvector_0 > - Moved to common > - Disable UseZvfh when no RVV src/hotspot/cpu/riscv/vm_version_riscv.hpp line 161: > 159: #define RV_NO_FLAG_BIT (BitsPerWord+1) // nth_bit will return 0 on values larger than BitsPerWord > 160: > 161: // Note: the order matters, depender should be after thier dependee. E.g. ext_V before ext_Zvbb. Noticed a typo here: s/thier/their/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24094#discussion_r2011194786 From ccheung at openjdk.org Tue Mar 25 04:18:07 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 25 Mar 2025 04:18:07 GMT Subject: RFR: 8352437: Support --add-exports with -XX:+AOTClassLinking In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 04:46:21 GMT, Ioi Lam wrote: > `-XX:+AOTClassLinking` requires the CDS archived full module graph (FMG). > > - Before this PR, when `--add-export` is specified, FMG is disabled, so AOT caches created with `-XX:+AOTClassLinking` cannot be loaded. > - After this PR, if the exact same `--add-export` flags as specified across the training/assembly/production phases, the FMG can be used, so we can use so AOT caches created with `-XX:+AOTClassLinking`. > > The change itself is straight-forward: just remember the `--add-export` flags specified during AOT cache creation, and check the exact same ones are used during the production run. > > I did a fair amount of refactoring to change the "exact options specified" checks in modules.cpp, so more such options can be easily added in the future (we need to handle `--add-reads` and `--add-opens` in future RFEs). > > (Note: this PR depends on #24122 ) Code changes look clean. I just have two minor comments on the tests. test/hotspot/jtreg/runtime/cds/appcds/jigsaw/ExactOptionMatch.java line 91: > 89: > 90: // (4) Dump = specified twice, Run = specified twice (but in different order) > 91: // Should still be able to use FMG (values are sorted by CDS). How about add another test case where the values are specified in the same order? test/lib/jdk/test/lib/cds/CDSModulePackager.java line 36: > 34: import jdk.test.lib.cds.CDSJarUtils.JarOptions; > 35: > 36: This file has no change other than the above blank line deletion. ------------- PR Review: https://git.openjdk.org/jdk/pull/24124#pullrequestreview-2712297986 PR Review Comment: https://git.openjdk.org/jdk/pull/24124#discussion_r2011266430 PR Review Comment: https://git.openjdk.org/jdk/pull/24124#discussion_r2011265840 From dholmes at openjdk.org Tue Mar 25 05:14:06 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 25 Mar 2025 05:14:06 GMT Subject: RFR: 8352140: UBSAN: fix the left shift of negative value in klass.hpp, array_layout_helper() In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 09:52:43 GMT, Afshin Zafari wrote: > The `array_layout_helper()` with `jint tag` as its first arg, is called with a `tag` whose sign-bit is always set and considered as negative. This negative value is UB in left-shift operation. Changing the type to `juint` fixes this. > > Tests: > linux-x64-debug tier1 with UBSAN enabled. That will fix it, but I can't help wonder whether `tag` is mis-typed in that function. In general that code seems very confused about signed vs unsigned and j(u)int versus (u)int. ?? ------------- PR Review: https://git.openjdk.org/jdk/pull/24184#pullrequestreview-2712394853 From dlong at openjdk.org Tue Mar 25 06:10:22 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Mar 2025 06:10:22 GMT Subject: RFR: 8352140: UBSAN: fix the left shift of negative value in klass.hpp, array_layout_helper() In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 09:52:43 GMT, Afshin Zafari wrote: > The `array_layout_helper()` with `jint tag` as its first arg, is called with a `tag` whose sign-bit is always set and considered as negative. This negative value is UB in left-shift operation. Changing the type to `juint` fixes this. > > Tests: > linux-x64-debug tier1 with UBSAN enabled. src/hotspot/share/oops/klass.hpp line 527: > 525: } > 526: static jint array_layout_helper(jint tag, int hsize, BasicType etype, int log2_esize) { > 527: return ((juint)tag << _lh_array_tag_shift) Doesn't this turn the type of the return expression to unsigned, causing an implicit conversion back to signed, which is implementation-defined? I think that's the reason for the weird-looking reinterpret_cast<> in JAVA_INTEGER_OP. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24184#discussion_r2011376367 From dholmes at openjdk.org Tue Mar 25 06:49:16 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 25 Mar 2025 06:49:16 GMT Subject: RFR: 8352140: UBSAN: fix the left shift of negative value in klass.hpp, array_layout_helper() In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 06:07:45 GMT, Dean Long wrote: >> The `array_layout_helper()` with `jint tag` as its first arg, is called with a `tag` whose sign-bit is always set and considered as negative. This negative value is UB in left-shift operation. Changing the type to `juint` fixes this. >> >> Tests: >> linux-x64-debug tier1 with UBSAN enabled. > > src/hotspot/share/oops/klass.hpp line 527: > >> 525: } >> 526: static jint array_layout_helper(jint tag, int hsize, BasicType etype, int log2_esize) { >> 527: return ((juint)tag << _lh_array_tag_shift) > > Doesn't this turn the type of the return expression to unsigned, causing an implicit conversion back to signed, which is implementation-defined? I think that's the reason for the weird-looking reinterpret_cast<> in JAVA_INTEGER_OP. Note we seem to do this a lot e.g. > ./share/oops/klass.cpp: `int tag = isobj ? _lh_array_tag_obj_value : _lh_array_tag_type_value;` `_lh_array_tag_type_value` is unsigned. So this call sequence goes from unsigned -> signed -> unsigned -> signed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24184#discussion_r2011428322 From dfenacci at openjdk.org Tue Mar 25 07:14:13 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 25 Mar 2025 07:14:13 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v4] In-Reply-To: References: <2jI87up85vKeQq7xy6WoI987MOuqTqA6I8G75VvC74g=.e8ef9f9c-b8b3-496d-9b48-28c83dc1fb64@github.com> Message-ID: <0Ck3LigYC74nHGVrxvZOlWJ2m5Jsxp1zaMjmES4pA_g=.313cf7fc-46ee-4b8c-94bd-519dab3e4aba@github.com> On Fri, 28 Feb 2025 20:36:23 GMT, Dean Long wrote: >> Refreshing my memory, isn't the real problem with trying to fix this with a minimum codecache size is that some of these stubs are not allocated during initial single-threaded JVM startup, but later when the first compiler threads start, and that allows other code blobs to fill up the codecache? > >> Even so, it might be a good idea to additionally increase the minimum code cache anyway. @dean-long do you think it would make sense to file an RFE for that? > > Sure, if it's still an issue. Thank you very much for your reviews @dean-long and @adinn! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23630#issuecomment-2750298159 From dfenacci at openjdk.org Tue Mar 25 07:14:14 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 25 Mar 2025 07:14:14 GMT Subject: Integrated: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) In-Reply-To: References: Message-ID: <1FYW5izBcd8fJ5zo507OispjDjN6EMRJR96PIlo9-Rs=.92b2883e-9ae5-4a5b-930e-16b6e3ff56c3@github.com> On Fri, 14 Feb 2025 11:04:20 GMT, Damon Fenacci wrote: > # Issue > The test `src/hotspot/share/opto/c2compiler.cpp` fails intermittently due to a crash that happens when trying to allocate code cache space for C1 and C2 in `RuntimeStub::new_runtime_stub` and `SingletonBlob::operator new`. > > # Causes > There are a few call paths during the initialization of C1 and C2 that can lead to the code cache allocations in `RuntimeStub::new_runtime_stub` (through `RuntimeStub::operator new`) and `SingletonBlob::operator new` triggering a fatal error if there is no more space. The paths in question are: > 1. `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_uncommon_trap_blob` -> `UncommonTrapBlob::create` -> `new UncommonTrapBlob` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_exception_blob` -> `ExceptionBlob::create` -> `new ExceptionBlob` > > # Solution > Instead of fatally crashing the we can use the `alloc_fail_is_fatal` flag of `RuntimeStub::new_runtime_stub` to avoid crashing in cases 1 and 2 and add a similar flag to `SingletonBlob::operator new` for cases 3 and 4. In the latter case we need to adjust all calls accordingly. > > Note: In [JDK-8326615](https://bugs.openjdk.org/browse/JDK-8326615) it was argued that increasing the minimum code cache size would solve the issue but that wasn't entirely accurate: doing so possibly decreases the chances of a failed allocation in these 4 places but doesn't totally avoid it. > > # Testing > The original failing regression test in `test/hotspot/jtreg/compiler/startup/StartupOutput.java` has been modified to run multiple times with randomized values (within the original failing range) to increase the chances of hitting the fatal assertion. > > Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) This pull request has now been integrated. Changeset: 48fac662 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/48fac6626c605f4679544e3dd24d5ad70561494a Stats: 139 lines in 27 files changed: 55 ins; 4 del; 80 mod 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) Reviewed-by: dlong, adinn ------------- PR: https://git.openjdk.org/jdk/pull/23630 From kbarrett at openjdk.org Tue Mar 25 07:19:09 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 25 Mar 2025 07:19:09 GMT Subject: RFR: 8352140: UBSAN: fix the left shift of negative value in klass.hpp, array_layout_helper() In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 06:46:14 GMT, David Holmes wrote: >> src/hotspot/share/oops/klass.hpp line 527: >> >>> 525: } >>> 526: static jint array_layout_helper(jint tag, int hsize, BasicType etype, int log2_esize) { >>> 527: return ((juint)tag << _lh_array_tag_shift) >> >> Doesn't this turn the type of the return expression to unsigned, causing an implicit conversion back to signed, which is implementation-defined? I think that's the reason for the weird-looking reinterpret_cast<> in JAVA_INTEGER_OP. > > Note we seem to do this a lot e.g. > >> ./share/oops/klass.cpp: `int tag = isobj ? _lh_array_tag_obj_value : _lh_array_tag_type_value;` > > `_lh_array_tag_type_value` is unsigned. So this call sequence goes from unsigned -> signed -> unsigned -> signed A problem we are facing here is that C++20 makes some integral operations defined that were previously undefined. This followed what implementations were actually doing. And yet, tools like ubsan (and constexpr-processing until C++20) treat them as UB. For example, https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Integers-implementation.html "The results of some bitwise operations on signed integers (C90 6.3, C99 and C11 6.5). ... As an extension to the C language, GCC does not use the latitude given in C99 and C11 only to treat certain aspects of signed ?< signed conversions is another thing that C++20 changed to be defined. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24184#discussion_r2011464234 From kbarrett at openjdk.org Tue Mar 25 07:24:07 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 25 Mar 2025 07:24:07 GMT Subject: RFR: 8352140: UBSAN: fix the left shift of negative value in klass.hpp, array_layout_helper() In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 07:15:59 GMT, Kim Barrett wrote: >> Note we seem to do this a lot e.g. >> >>> ./share/oops/klass.cpp: `int tag = isobj ? _lh_array_tag_obj_value : _lh_array_tag_type_value;` >> >> `_lh_array_tag_type_value` is unsigned. So this call sequence goes from unsigned -> signed -> unsigned -> signed > > A problem we are facing here is that C++20 makes some integral operations > defined that were previously undefined. This followed what implementations were > actually doing. And yet, tools like ubsan (and constexpr-processing until > C++20) treat them as UB. > > For example, > https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Integers-implementation.html > "The results of some bitwise operations on signed integers (C90 6.3, C99 and C11 6.5). > ... > As an extension to the C language, GCC does not use the latitude given in C99 > and C11 only to treat certain aspects of signed ?< -fsanitize=shift (and -fsanitize=undefined) will diagnose such cases. They are > also diagnosed where constant expressions are required." > > Hence, I'm at least somewhat inclined to call a ubsan complaint about left > shift of a negative value a false positive. > > The implementation-defined behavior of unsigned => signed conversions is > another thing that C++20 changed to be defined. Note that the discussion that led to the "weird-looking cast" in JAVA_INTEGER_OP significantly predates the standard committee's decision to enshrine two's-complement integers in C++20. If we were to have that discussion today my opinion would be quite different from what it was at the time of that discussion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24184#discussion_r2011470356 From shade at openjdk.org Tue Mar 25 08:23:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Mar 2025 08:23:14 GMT Subject: RFR: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 07:06:25 GMT, Aleksey Shipilev wrote: >> Little regression crept in with [JDK-8351142](https://bugs.openjdk.org/browse/JDK-8351142): on the deflation path, object associated with monitor can be already dead. >> >> A new stress test fails within seconds without a fix. It also covers other monitor events, so we have extra coverage there as well. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, new stress test now passes >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Emit the event anyway Thanks! I gave it another spin through testing, and it still looks green. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24121#issuecomment-2750446642 From shade at openjdk.org Tue Mar 25 08:23:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Mar 2025 08:23:16 GMT Subject: Integrated: 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead In-Reply-To: References: Message-ID: <_Jredh07kLKjSsUYaGg5EQV6h29DV6N3JDnk5bOgvvQ=.fe4d0a3e-5268-43b7-88a3-90aa5ebca4ce@github.com> On Wed, 19 Mar 2025 17:47:00 GMT, Aleksey Shipilev wrote: > Little regression crept in with [JDK-8351142](https://bugs.openjdk.org/browse/JDK-8351142): on the deflation path, object associated with monitor can be already dead. > > A new stress test fails within seconds without a fix. It also covers other monitor events, so we have extra coverage there as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, new stress test now passes > - [x] Linux x86_64 server fastdebug, `jdk_jfr` This pull request has now been integrated. Changeset: 17dc30c5 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/17dc30c54e90a339783b7da6ef282a2206205653 Stats: 177 lines in 4 files changed: 169 ins; 3 del; 5 mod 8352414: JFR: JavaMonitorDeflateEvent crashes when deflated monitor object is dead Reviewed-by: dholmes, mgronlun ------------- PR: https://git.openjdk.org/jdk/pull/24121 From mbaesken at openjdk.org Tue Mar 25 08:32:03 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 25 Mar 2025 08:32:03 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp [v5] In-Reply-To: References: Message-ID: > There are a few divisions by zero in sharedRuntimeTrans.cpp, used to "construct" NaN and -infinity. This should probably be replaced by using functionality from std::numeric_limits . Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: adjust check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24136/files - new: https://git.openjdk.org/jdk/pull/24136/files/2b82fe21..0e78b823 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24136&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24136&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24136/head:pull/24136 PR: https://git.openjdk.org/jdk/pull/24136 From sspitsyn at openjdk.org Tue Mar 25 09:01:51 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 25 Mar 2025 09:01:51 GMT Subject: RFR: 8352812: remove useless class and function parameter in SuspendThread impl Message-ID: The internal class JvmtiSuspendControl is transitively used in the SuspendThread implementation is not really needed and is being removed. Also, the suspend_thread function has unused need_safepoint_p parameter which is being removed as well. Testing: - TBD: Run mach5 tiers 1-3 to be safe ------------- Commit messages: - 8352812: remove useless class and function parameter in SuspendThread impl Changes: https://git.openjdk.org/jdk/pull/24219/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24219&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352812 Stats: 68 lines in 5 files changed: 0 ins; 58 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24219.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24219/head:pull/24219 PR: https://git.openjdk.org/jdk/pull/24219 From azafari at openjdk.org Tue Mar 25 09:45:16 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 25 Mar 2025 09:45:16 GMT Subject: RFR: 8352140: UBSAN: fix the left shift of negative value in klass.hpp, array_layout_helper() In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 07:21:14 GMT, Kim Barrett wrote: >> A problem we are facing here is that C++20 makes some integral operations >> defined that were previously undefined. This followed what implementations were >> actually doing. And yet, tools like ubsan (and constexpr-processing until >> C++20) treat them as UB. >> >> For example, >> https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Integers-implementation.html >> "The results of some bitwise operations on signed integers (C90 6.3, C99 and C11 6.5). >> ... >> As an extension to the C language, GCC does not use the latitude given in C99 >> and C11 only to treat certain aspects of signed ?<> -fsanitize=shift (and -fsanitize=undefined) will diagnose such cases. They are >> also diagnosed where constant expressions are required." >> >> Hence, I'm at least somewhat inclined to call a ubsan complaint about left >> shift of a negative value a false positive. >> >> The implementation-defined behavior of unsigned => signed conversions is >> another thing that C++20 changed to be defined. > > Note that the discussion that led to the "weird-looking cast" in > JAVA_INTEGER_OP significantly predates the standard committee's decision to > enshrine two's-complement integers in C++20. If we were to have that > discussion today my opinion would be quite different from what it was at the > time of that discussion. For my own learning: When developers use left-shift for doubling a value, then a negative operand may changed to a positive since the sign-bit may change. For example in signed short int x = -32768; signed short int y = x << 1; ``` the value of `y` would be `0`. So, when the left-shift is used as an arithmetic op, both the sign and size of the result/operand should be carefully considered. And, this is not dependent on C++xx. So, left-shift of negative value is UB, until the developer explicitly decides on the type of the operand or the result. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24184#discussion_r2011706162 From mli at openjdk.org Tue Mar 25 10:27:08 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 25 Mar 2025 10:27:08 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v4] In-Reply-To: <3offilavnorMFRTRJK8oCgc4VkWQ-tbqka-HbqvcLjs=.9e76929a-4b7f-4c63-b662-f548fa3f9ec0@github.com> References: <3offilavnorMFRTRJK8oCgc4VkWQ-tbqka-HbqvcLjs=.9e76929a-4b7f-4c63-b662-f548fa3f9ec0@github.com> Message-ID: On Mon, 24 Mar 2025 08:36:57 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> Added case to turn off UseZvfh when no RVV. >> Which is the cause of the test issues, zvfh on but no rvv. >> >> Also made all case identical and added no warning when default. >> Move them to the common init, as the "UseExtension" is not C2 specific. >> >> Manual tested and some random compiler tests. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - dep check > - Merge branch 'master' into maxvector_0 > - Merge branch 'master' into maxvector_0 > - Merge branch 'master' into maxvector_0 > - hwprobe deps > - Merge branch 'master' into maxvector_0 > - Moved to common > - Disable UseZvfh when no RVV Thanks for updating. Looks good. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24094#pullrequestreview-2713193013 From shade at openjdk.org Tue Mar 25 10:43:32 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Mar 2025 10:43:32 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v3] In-Reply-To: References: Message-ID: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone - Drop commented out block from deprecations - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone - Generic 32-bit x86 configure error supercedes Windows 32-bit x86 - 8345169: Implement JEP 503: Remove the 32-bit x86 Port ------------- Changes: https://git.openjdk.org/jdk/pull/23906/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23906&range=02 Stats: 29733 lines in 25 files changed: 4 ins; 29728 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23906.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23906/head:pull/23906 PR: https://git.openjdk.org/jdk/pull/23906 From cnorrbin at openjdk.org Tue Mar 25 11:01:46 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 25 Mar 2025 11:01:46 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 06:28:57 GMT, David Holmes wrote: > Did you determine that the deleted RM's were put in to be used by the LogStream rather than the things being printed to the LogStream? It is quite difficult to be sure you have exercised all of the logging code that was modified. It is quite likely many of these log outputs are not actually being tested anywhere (and very difficult to verify one way or another). Have you tested by enabling all logging in some simple tests on all platforms? (Of course that is nowhere near sufficient in terms of coverage as you would need to test with numerous permutations of VM features.) > > Thanks These RMs were specifically for LogStream usage only. I left other cases untouched where they were needed for the printed content (`as_klass_external_name` is frequent) or other surrounding code. For testing, I've also run low-tier tests with `-Xlog:all=trace`, and kept some long-running Java programs with full logging enabled. The platform-specific code immediately goes into shared implementations, so exhaustive platform-specific testing shouldn't be necessary. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24162#issuecomment-2750876047 From shade at openjdk.org Tue Mar 25 13:29:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Mar 2025 13:29:30 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v3] In-Reply-To: References: Message-ID: <6aH6pqUjOOhfguuCXDjuRPNpieiu2rzJ7XxnTFQ2D4w=.fe2d0cd3-181d-4189-a3cb-6637bf85d89c@github.com> On Tue, 25 Mar 2025 10:43:32 GMT, Aleksey Shipilev wrote: >> This PR implements JEP 503: Remove the 32-bit x86 Port. >> >> The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. >> >> This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. >> >> The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. >> >> Additional testing: >> - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) >> - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Drop commented out block from deprecations > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Generic 32-bit x86 configure error supercedes Windows 32-bit x86 > - 8345169: Implement JEP 503: Remove the 32-bit x86 Port JEP is now targeted to JDK 25. I remerged from master, resolved a few easy conflicts in files that are removed by this PR anyway, and did some light testing. Everything looks green. I only miss the re-review after the merge. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2751258762 From jsikstro at openjdk.org Tue Mar 25 14:06:46 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 25 Mar 2025 14:06:46 GMT Subject: RFR: 8352762: Use EXACTFMT instead of expanded version where applicable Message-ID: [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) introduced the EXACTFMT macro, which is a shorthand for printing exact values using methods defined in globalDefinitions.hpp. There are currently 20 places in HotSpot which uses the expanded version of the macro, along with the "trace_page_size_params" macro that is defined and used in os.cpp. I have replaced places that use the expanded macro(s) with EXACTFMT + EXACTFMTARGS, and also removed trace_page_size_params from os.cpp, which was essentially a redefnition of EXACTFMTARGS. Testing: GHA, tiers 1-4 ------------- Commit messages: - 8352762: Use EXACTFMT instead of expanded version where applicable Changes: https://git.openjdk.org/jdk/pull/24228/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24228&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352762 Stats: 70 lines in 8 files changed: 0 ins; 20 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/24228.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24228/head:pull/24228 PR: https://git.openjdk.org/jdk/pull/24228 From duke at openjdk.org Tue Mar 25 14:23:21 2025 From: duke at openjdk.org (Luca Kellermann) Date: Tue, 25 Mar 2025 14:23:21 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 16:19:17 GMT, Per Minborg wrote: > > Comments on visual noise and side effects in `toString`. > > `renderWrapped` is clever for a single stable value, but it makes for a very noisy display string, with confusing doubly-nested `[]`, for composite stable values. I'm talking about `StableFunction` mainly, I guess. > > I suggest omitting the inner `[]` for such composites. A simple boolean on `renderWrapped` will do that trick. In that case, `renderWrapped` has the job of either presenting a fixed (recognizable) sentinel string, or else forwards, without further editorial comment, to the `toString` of the contained value. > > The `toString()` for `StableValue` is inspired by `Optional` which works in the same way by adding `[ ]` around the contents. Any more thought in the reviewer community on how we should handle this? I think the comment just proposes that this code var f = StableValue.intFunction(3, i -> i); f.apply(1); System.out.println(f); should print StableIntFunction[values=[.unset, 1, .unset], original=...] Right now it prints this: StableIntFunction[values=[.unset, [1], .unset], original=...] The `toString` implementation for `StableValueImpl` itself seems fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2751428512 From rehn at openjdk.org Tue Mar 25 14:25:38 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 25 Mar 2025 14:25:38 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user Message-ID: Hi, for you to consider. These tests constantly fails in qemu-user. Either the require host to be same arch or they are very very slow in emulation. E.g. "ptrace(PTRACE_ATTACH, ..) failed for 405157: Function not implemented'" for SA tests. This is the initial set of tests, there are many more, but I need to do some more verification for those. >From bug: > qemu-user/rv64 sets uarch to "qemu" in /proc/cpuinfo (qemu-system do not do that). > We add this uarch to CPU feature string. > This means we can use jtreg 'require' with cpu string to filter out tests in qemu-user. Relevant qemu code: https://github.com/qemu/qemu/blob/170825d14d88a1ce7fae98d5a928480f2f329b22/linux-user/riscv/target_proc.h#L29 Relevant hotspot code: https://github.com/openjdk/jdk/blob/fa0b18bfde38ee2ffbab33a9eaac547fe8aa3c7c/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp#L250 Tested that the require only filters out tests in qemu+riscv64. Thanks! /Robbin ------------- Commit messages: - more - more - native or very long Changes: https://git.openjdk.org/jdk/pull/24229/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24229&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352730 Stats: 135 lines in 100 files changed: 135 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24229/head:pull/24229 PR: https://git.openjdk.org/jdk/pull/24229 From mdoerr at openjdk.org Tue Mar 25 15:22:22 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 25 Mar 2025 15:22:22 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v3] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 10:43:32 GMT, Aleksey Shipilev wrote: >> This PR implements JEP 503: Remove the 32-bit x86 Port. >> >> The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. >> >> This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. >> >> The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. >> >> Additional testing: >> - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) >> - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Drop commented out block from deprecations > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Generic 32-bit x86 configure error supercedes Windows 32-bit x86 > - 8345169: Implement JEP 503: Remove the 32-bit x86 Port LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23906#pullrequestreview-2714187303 From ihse at openjdk.org Tue Mar 25 15:27:14 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 25 Mar 2025 15:27:14 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v3] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 10:43:32 GMT, Aleksey Shipilev wrote: >> This PR implements JEP 503: Remove the 32-bit x86 Port. >> >> The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. >> >> This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. >> >> The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. >> >> Additional testing: >> - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) >> - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Drop commented out block from deprecations > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Generic 32-bit x86 configure error supercedes Windows 32-bit x86 > - 8345169: Implement JEP 503: Remove the 32-bit x86 Port Marked as reviewed by ihse (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23906#pullrequestreview-2714207496 From kbarrett at openjdk.org Tue Mar 25 15:40:13 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 25 Mar 2025 15:40:13 GMT Subject: RFR: 8352140: UBSAN: fix the left shift of negative value in klass.hpp, array_layout_helper() In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 09:42:13 GMT, Afshin Zafari wrote: >> Note that the discussion that led to the "weird-looking cast" in >> JAVA_INTEGER_OP significantly predates the standard committee's decision to >> enshrine two's-complement integers in C++20. If we were to have that >> discussion today my opinion would be quite different from what it was at the >> time of that discussion. > > For my own learning: > When developers use left-shift for doubling a value, then a negative operand may changed to a positive since the sign-bit may change. For example in > > signed short int x = -32768; > signed short int y = x << 1; > ``` > the value of `y` would be `0`. So, when the left-shift is used as an arithmetic op, both the sign and size of the result/operand should be carefully considered. And, this is not dependent on C++xx. > So, left-shift of negative value is UB, until the developer explicitly decides on the type of the operand or the result. signed short int x = -32768; signed short int y = x << 1; That does seem like an interestingly weird case. Unless I'm missing something, there's no UB-overflow in that. The shift expression promotes `short x` to `int x`, sign extending it. The `int`-typed shift is fine (since C++20, and effectively so prior to that in non-constexpr-required contexts - see below). And the implicit conversion to `short y` is implementation-defined (before C++20, though gcc may warn (-Woverflow)) or fine (since C++20). gcc warns about x being negative in C++11 to C++17 modes (-Wshift-negative-value enabled by default), but doesn't treat it as UB. Before C++20 gcc errors (warns if -fpermissive) if it's in a required-constexpr-context, even if -Wshift-negative-value is disabled. That all seems consistent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24184#discussion_r2012390378 From pminborg at openjdk.org Tue Mar 25 15:52:07 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 25 Mar 2025 15:52:07 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v8] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Revamp toString() methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/4c0dadfb..42d4dcfa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=06-07 Stats: 173 lines in 14 files changed: 79 ins; 55 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From dnsimon at openjdk.org Tue Mar 25 15:54:23 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 25 Mar 2025 15:54:23 GMT Subject: RFR: 8352645: Add tool support to check order of includes In-Reply-To: <-LIpZBv5VsK3GyY-zNar1x2BU30LhdlCnQalEigLAsA=.67180c55-a332-4219-8056-3549bd45c200@github.com> References: <-LIpZBv5VsK3GyY-zNar1x2BU30LhdlCnQalEigLAsA=.67180c55-a332-4219-8056-3549bd45c200@github.com> Message-ID: On Tue, 25 Mar 2025 14:56:27 GMT, Magnus Ihse Bursie wrote: > Did you consider writing the tool in Java? Or rather, could you be convinced to convert it to Java? With the source code launch mechanism, it is just as simple to run as a python script. Also, there is some kind of optics about it as well, where we actually use Java for developing the JDK. Good idea. I'll give it a go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24180#issuecomment-2751733004 From kbarrett at openjdk.org Tue Mar 25 16:21:23 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 25 Mar 2025 16:21:23 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp [v5] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 08:32:03 GMT, Matthias Baesken wrote: >> There are a few divisions by zero in sharedRuntimeTrans.cpp, used to "construct" NaN and -infinity. This should probably be replaced by using functionality from std::numeric_limits . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust check Looks good. src/hotspot/share/runtime/sharedRuntimeTrans.cpp line 518: > 516: z = ax; /*x is +-0,+-inf,+-1*/ > 517: if(hy<0) { > 518: if (ix == 0) { [Just a comment, as the level of change involved seems out of scope for the issue at hand. And probably not worth the effort, since x==0.0 is probably not all that common. Feel free to completely ignore this.] I feel like there ought to be a way to restructure this to merge the `ix == 0` here and the one a few lines earlier. Something like (completely untested and quite possibly wrong) if (ix == 0) { z = std::numeric_limits::infinity(); if (hx < 0 && yisint == 1) { z = -z; } return z; } else if (ix == 0x7ff00000 || ix == 0x3ff00000) { // ix==0 moved earlier ... ; return z; } src/hotspot/share/runtime/sharedRuntimeTrans.cpp line 519: > 517: if(hy<0) { > 518: if (ix == 0) { > 519: z = std::numeric_limits::infinity(); Good thing we have tests! I was looking at the comparison, and entirely missed that this needed to be infinity rather than the quiet_nan in earlier commits. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24136#pullrequestreview-2714333632 PR Review Comment: https://git.openjdk.org/jdk/pull/24136#discussion_r2012463092 PR Review Comment: https://git.openjdk.org/jdk/pull/24136#discussion_r2012435480 From pminborg at openjdk.org Tue Mar 25 16:26:26 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 25 Mar 2025 16:26:26 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v8] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 15:52:07 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Revamp toString() methods I have rewritten all the `toString()` methods. A `StableList::toString` now produces something much more similar to a regular `List::toString`. The only difference is that the `StableList::toString` shows ".unset" for the elements that are not yet evaluated. In other words, `StableList::toString` no longer evaluates all the elements, but rather does a "high impedance" scan over them and if evaluated, invokes `toString` on the element, otherwise just shows ".unset" for that element. The same goes for `StableMap` and all the stable functions (which now share the same code path as the stable collections). `StableValue` itself does not add extra square brackets around its content. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2751829347 From lmesnik at openjdk.org Tue Mar 25 18:06:12 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 25 Mar 2025 18:06:12 GMT Subject: RFR: 8352812: remove useless class and function parameter in SuspendThread impl In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 08:53:48 GMT, Serguei Spitsyn wrote: > The internal class JvmtiSuspendControl is transitively used in the SuspendThread implementation is not really needed and is being removed. Also, the suspend_thread function has unused need_safepoint_p parameter which is being removed as well. > > Testing: > - TBD: Run mach5 tiers 1-3 to be safe The fix looks good. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24219#pullrequestreview-2714716556 From jiangli at openjdk.org Tue Mar 25 18:51:28 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 25 Mar 2025 18:51:28 GMT Subject: RFR: 8352766: Problemlist hotspot tier1 tests requiring tools that are not included in static JDK In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 03:02:32 GMT, Jiangli Zhou wrote: > Please review this change that adds test/hotspot/jtreg/ProblemList-StaticJdk.txt, which problemlists 27 hotspot tier1 tests that use `javac`, `jstack`, `jcmd` and `jhsdb` at runtime. > > Following is an example of the command that I use to run hotspot tier1 tests on static JDK with the extra `ProblemList-StaticJdk.txt`: > > > $ make test TEST="test/hotspot/jtreg:tier1" JDK_UNDER_TEST=//JDK-8352766/build/linux-x86_64-server-fastdebug/images/static-jdk JDK_FOR_COMPILE=//JDK-8352766/build/linux-x86_64-server-fastdebug/images/jdk JTREG="EXTRA_PROBLEM_LISTS=//JDK-8352766/test/hotspot/jtreg/ProblemList-StaticJdk.txt" @magicus pointed out that there's https://bugs.openjdk.org/browse/JDK-8346719 already. I closed JDK-8352764 as the duplicate of JDK-8346719 and updated test/hotspot/jtreg/ProblemList-StaticJdk.txt to use 8346719. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24214#issuecomment-2752113209 From jiangli at openjdk.org Tue Mar 25 18:51:28 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 25 Mar 2025 18:51:28 GMT Subject: RFR: 8352766: Problemlist hotspot tier1 tests requiring tools that are not included in static JDK Message-ID: Please review this change that adds test/hotspot/jtreg/ProblemList-StaticJdk.txt, which problemlists 27 hotspot tier1 tests that use `javac`, `jstack`, `jcmd` and `jhsdb` at runtime. Following is an example of the command that I use to run hotspot tier1 tests on static JDK with the extra `ProblemList-StaticJdk.txt`: $ make test TEST="test/hotspot/jtreg:tier1" JDK_UNDER_TEST=//JDK-8352766/build/linux-x86_64-server-fastdebug/images/static-jdk JDK_FOR_COMPILE=//JDK-8352766/build/linux-x86_64-server-fastdebug/images/jdk JTREG="EXTRA_PROBLEM_LISTS=//JDK-8352766/test/hotspot/jtreg/ProblemList-StaticJdk.txt" ------------- Commit messages: - Replace 8352764 with 8346719. - Problemlist runtime/HiddenClasses/DefineHiddenClass.java. - Add test/hotspot/jtreg/ProblemList-StaticJdk.txt. Changes: https://git.openjdk.org/jdk/pull/24214/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24214&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352766 Stats: 34 lines in 1 file changed: 34 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24214.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24214/head:pull/24214 PR: https://git.openjdk.org/jdk/pull/24214 From stefank at openjdk.org Tue Mar 25 18:58:08 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 25 Mar 2025 18:58:08 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 17:00:28 GMT, Thomas Stuefe wrote: > > And isn't this locking scheme exactly what the current code is using? Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations. > > What about the case where one thread reserves a range and another thread releases it? > > 1 Thread A reserves range 2 Thread B releases range 3 Thread B tells NMT "range released" 4 Thread A tells NMT "range reserved" > > This would either result in an assert in NMT at step 3 when releasing a range NMT does not know. Or in an incorrectly booked range in step 4 without asserts. Am I making a thinking error somewhere? In a scenario like that, doesn't Thread A have to communicate somehow that Thread B can now start using and/or releasing that reservation? It sounds like you would have a race condition if you don't have synchronization to hand over the reservation from one thread to another. I would expect that such communication would be be placed after the NMT booking in Thread A. Thread A: reserve lock NMT booking unlock Thread B: lock release NMT booking unlock Are there any code that we know of that doesn't fit into a synchronization pattern similar to the above? I can think of some contrived example where Thread B asks the OS for memory mappings and uses that to ascertain that a pre-determined address has been reserved, and how that could lead to an incorrect booking as you described, but do we really have code like that? If we do, should we really have code like that? Are there some other patterns that I'm missing? Related to this, I talked to @xmas92 and he mused about swapping the order of the release and NMT release booking as a way to shrink the lock scope for the release operation: Thread A: reserve lock NMT booking unlock Thread B: lock NMT booking unlock release As long as we hold the reservation, no other thread can re-reserve the reservation, so Thread B can take its time to first perform the NMT release booking under the lock, and then perform the release without the lock. If another thread (say Thread C) manages to re-reserve the memory, it sounds reasonable that the NMT release booking should have already completed. If we were to try this out, we would have to verify/ensure that the release/reserve pairs perform enough synchronization to make this work. Axel can probably correct me if I mischaracterized what he wrote. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2752240832 From stefank at openjdk.org Tue Mar 25 19:01:10 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 25 Mar 2025 19:01:10 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 17:11:55 GMT, Robert Toyonaga wrote: > Hi stefank, I think you're right about (1.1) (2.1) (2.2) (1.2) being prevented by the current implementation. Is there a reason that the current implementation only does the wider locking for release/uncommit? Maybe (2.1) (1.1) (1.2) (2.2) isn't much of an issue since it's unlikely that another thread would uncommit/release the same base address shortly after it's committed/reserved? I'm very curious to find out if anyone knows how this could happen without a race condition hand-over from one thread to another. (See my answer to St?fe). > > > Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations. > > I haven't seen that race in the wild, I just noticed that the memory operations weren't protected and thought that it could be a problem. OK. Let's see if anyone finds a hole in my arguments given above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2752247631 From stefank at openjdk.org Tue Mar 25 19:13:12 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 25 Mar 2025 19:13:12 GMT Subject: RFR: 8352645: Add tool support to check order of includes In-Reply-To: References: <75t-g0hbKrY5JugITjnd8Vhxi86fuVR8g1_KXyLS4G8=.0acd8ca4-67f9-4723-9735-399d03a848bf@github.com> Message-ID: On Tue, 25 Mar 2025 14:52:56 GMT, Magnus Ihse Bursie wrote: >> I pushed a commit that prevents the re-ordering: https://github.com/openjdk/jdk/pull/24180/commits/c0f202d2a7e7b8788719fe8cd2a4c7a095ecd3bb > > My gut reaction is that header files should be self-sustaining, that is if they need some external header files, these should be included by the header file itself. But that is up to the hotspot folks to decide. Magnus is correct. Though, it's really hard to maintain that unless you have a tool to help out with that, so we tend to fix this whenever we find include issues. For uin64_t (and friends) we include globalDefinitions.hpp instead of including the system headers directly. (FWIW, this file also has its own style for the includes guards. This should probably be updated to follow the other include guards we use in HotSpot, but not in this PR) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012780477 From cjplummer at openjdk.org Tue Mar 25 19:45:17 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 25 Mar 2025 19:45:17 GMT Subject: RFR: 8352812: remove useless class and function parameter in SuspendThread impl In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 08:53:48 GMT, Serguei Spitsyn wrote: > The internal class JvmtiSuspendControl is transitively used in the SuspendThread implementation is not really needed and is being removed. Also, the suspend_thread function has unused need_safepoint_p parameter which is being removed as well. > > Testing: > - TBD: Run mach5 tiers 1-3 to be safe Looks good. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24219#pullrequestreview-2714992198 From psandoz at openjdk.org Tue Mar 25 20:04:22 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 25 Mar 2025 20:04:22 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v8] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 15:52:07 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Revamp toString() methods src/java.base/share/classes/java/lang/StableValue.java line 426: > 424: * regardless if invoked by several threads. Also, the provided {@code supplier} > 425: * will only be invoked once even if invoked from several threads unless the > 426: * {@code supplier} throws an exception. This seems like unnecessary detail. I would just drop it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2012850937 From duke at openjdk.org Tue Mar 25 20:23:20 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Tue, 25 Mar 2025 20:23:20 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 18:55:56 GMT, Stefan Karlsson wrote: > ... swapping the order of the release and NMT release booking as a way to shrink the lock scope for the release operation: ... As long as we hold the reservation, no other thread can re-reserve the reservation, so Thread B can take its time to first perform the NMT release booking under the lock, and then perform the release without the lock. Hi @stefank, I think that's true. But if the release/uncommit does not complete successfully we would need to readjust the accounting afterward. To do that we would need to retrieve the original memtag (at least for reserved regions) and potentially need to retrieve the original callsite data (if we're in detailed mode). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2752423849 From stefank at openjdk.org Tue Mar 25 20:43:26 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 25 Mar 2025 20:43:26 GMT Subject: RFR: 8352645: Add tool support to check order of includes In-Reply-To: References: Message-ID: On Sun, 23 Mar 2025 21:14:47 GMT, Doug Simon wrote: > This PR adds `bin/sort_includes.py`, a python3 script to check that blocks of include statements in C++ files are sorted alphabetically and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). > This script can also update files with unsorted includes. The second commit in this PR shows the result of running: > > python3 ./bin/sort_includes.py ./src/hotspot > > To prevent an include being reordered, put at least one non-space character after the closing `"` or `>`. See `src/hotspot/share/adlc/archDesc.cpp` for an example. > > Assuming this PR is integrated, jcheck could be updated to use it to ensure include statements remain sorted. Thanks for taking the time to write this tool! I haven't reviewed the script, but I looked over the resulting changes to the includes and have commented on places where I see a discrepancy between what the tool generated and what I would have changed the code to be. Some of my comments are things that we would be great if the tool could fix, but I also think that we could be pragmatic and just fix these manually. OTOH, if we need to do a bunch of manual adjustment then it might not be ready to be included in jcheck. I think it depends on how this tool is going to be used. If it is going to be an authoritative tool for our includes, then it probably needs to handle some of the cases that I listed. With that said, even if there are some back-and-forth discussion about the tool, I think pushing the sort-order fixes by themselves would be worthwhile in it self. Thanks again! src/hotspot/cpu/aarch64/foreignGlobals_aarch64.cpp line 32: > 30: #include "prims/vmstorage.hpp" > 31: #include "runtime/jniHandles.hpp" > 32: #include "runtime/jniHandles.inline.hpp" I know that you didn't introduce this, but I wanted to point out that the convention we are using is to not include the .hpp file if the associated .inline.hpp is included. src/hotspot/cpu/arm/gc/g1/g1BarrierSetAssembler_arm.cpp line 32: > 30: #include "gc/g1/g1HeapRegion.hpp" > 31: #include "gc/g1/g1ThreadLocalData.hpp" > 32: #include "gc/g1/g1ThreadLocalData.hpp" Not your tools fault, but in case you also want to handle this case, we have two includes of the same file here. src/hotspot/cpu/ppc/vm_version_ppc.cpp line 44: > 42: #if defined(_AIX) > 43: #include "os_aix.hpp" > 44: For this file I would have expected the separation of system includes be done as follows: #include "utilities/powerOfTwo.hpp" #if defined(_AIX) #include "os_aix.hpp" #endif #include #if defined(_AIX) #include #endif src/hotspot/os/aix/os_aix.cpp line 98: > 96: #include > 97: #include > 98: #include FWIW, it would be interesting to hear if the AIX maintainers really need the define in the middle of the system includes ... src/hotspot/os/aix/porting_aix.cpp line 30: > 28: #define __XCOFF64__ > 29: #include > 30: This blank line should probably be left in place. (It would also be nice to have a blankline between line 23 and 24, but not your tool's job to fix that) src/hotspot/os/bsd/memMapPrinter_macosx.cpp line 34: > 32: #include "utilities/powerOfTwo.hpp" > 33: > 34: One too many blanklines. src/hotspot/os/linux/osContainer_linux.cpp line 34: > 32: #include > 33: #include > 34: #include There are too blank lines after the includes. (In case you want to enhance your tool to also clean that out) src/hotspot/os/posix/safefetch_sigjmp.cpp line 36: > 34: // For SafeFetch we need POSIX TLS and sigsetjmp/longjmp. > 35: #include > 36: #include Missing blankline after this include. (In case you want to add support for this cleanup in your tool) src/hotspot/os/windows/os_windows.cpp line 108: > 106: #include // For os::dll_address_to_function_name > 107: // for enumerating dll libraries > 108: #include The include moved, but the comment was left, so the comment now refers to the wrong include (I think). FWIW, I tend to remove these comments about why they are included because they become obsolete if you start to use more things from the headers. src/hotspot/os/windows/symbolengine.cpp line 31: > 29: #include "windbghelp.hpp" > 30: > 31: #include This left two consecutive blanklines. src/hotspot/os/windows/systemMemoryBarrier_windows.cpp line 27: > 25: #include "systemMemoryBarrier_windows.hpp" > 26: > 27: #include I'm a little curious about why this was needed, but just a little bit ... src/hotspot/share/adlc/archDesc.cpp line 27: > 25: > 26: // archDesc.cpp - Internal format for architecture definition > 27: #include // do not reorder I *guess* that this has to do with the assert define. I think Kim has another workaround for that. src/hotspot/share/cds/archiveBuilder.cpp line 51: > 49: #include "memory/allStatic.hpp" > 50: #include "memory/memRegion.hpp" > 51: #include "memory/memoryReserver.hpp" I think this can be argued is a bug. In previous discussion we (or, at least I) have argued that the sort order should be case-insensitive. src/hotspot/share/compiler/compilationFailureInfo.cpp line 37: > 35: #ifdef COMPILER2 > 36: #include "opto/compile.hpp" > 37: #include "opto/node.hpp" Here the entire: #ifdef COMPILER2 #include "opto/compile.hpp" #include "opto/node.hpp" #endif block should be last. I don't know if your tool should try to fix this. src/hotspot/share/gc/g1/g1ConcurrentRebuildAndScrub.cpp line 40: > 38: #include "utilities/globalDefinitions.hpp" > 39: > 40: Stray extra blankline src/hotspot/share/gc/shared/gcConfiguration.cpp line 27: > 25: #include "gc/shared/gcArguments.hpp" > 26: #include "gc/shared/gcConfiguration.hpp" > 27: #include "gc/shared/gc_globals.hpp" I think the manual sorting order placed _ before letter. Your tool undos that. I think this is OK, but I wanted to point this out so that this is done intentionally. src/hotspot/share/gc/shenandoah/heuristics/shenandoahGenerationalHeuristics.cpp line 37: > 35: #include "logging/log.hpp" > 36: > 37: Stray extra blankline src/hotspot/share/gc/shenandoah/heuristics/shenandoahGlobalHeuristics.cpp line 33: > 31: #include "utilities/quickSort.hpp" > 32: > 33: Stray extra blankline src/hotspot/share/gc/shenandoah/heuristics/shenandoahYoungHeuristics.cpp line 35: > 33: #include "utilities/quickSort.hpp" > 34: > 35: Stray extra blankline src/hotspot/share/gc/shenandoah/shenandoahController.cpp line 30: > 28: #include "gc/shenandoah/shenandoahHeap.hpp" > 29: #include "gc/shenandoah/shenandoahHeapRegion.inline.hpp" > 30: #include "shenandoahCollectorPolicy.hpp" (I think it would be nice if Shenandoah maintainers added the gc/shenandoah/ to this include) src/hotspot/share/gc/shenandoah/shenandoahController.cpp line 31: > 29: #include "gc/shenandoah/shenandoahHeapRegion.inline.hpp" > 30: #include "shenandoahCollectorPolicy.hpp" > 31: Stray extra blankline src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 43: > 41: #include "utilities/quickSort.hpp" > 42: > 43: Stray extra blankline (there are probably more, but I'll skip this) src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 81: > 79: #include "gc/shenandoah/shenandoahYoungGeneration.hpp" > 80: > 81: This section has three blanklines, but there should be none if this is to follow the rest of the HotSpot include style. src/hotspot/share/opto/output.cpp line 39: > 37: #include "opto/block.hpp" > 38: #include "opto/c2_MacroAssembler.hpp" > 39: #include "opto/c2compiler.hpp" Hmm. #include "opto/c2_MacroAssembler.hpp" #include "opto/c2compiler.hpp" Here _ comes before lower-case letters. From other places I see that _ comes after upper-case letters. I realize that this is caused by the ASCII value, but it is a bit unfortunate (IMHO). OTOH, maybe this will be consistent (the way I would like it, at least) if the sorting was done on lower-cased strings. src/hotspot/share/prims/foreignGlobals.cpp line 25: > 23: > 24: #include "classfile/javaClasses.hpp" > 25: #include "foreignGlobals.hpp" (For maintainers of this file, this should be prefixed with prims/) src/hotspot/share/services/diagnosticCommand.cpp line 72: > 70: #ifdef LINUX > 71: #include "mallocInfoDcmd.hpp" > 72: #include "os_posix.hpp" This should probably be: #ifdef LINUX #include "mallocInfoDcmd.hpp" #include "os_posix.hpp" #include "trimCHeapDCmd.hpp" #endif #ifdef LINUX #include #endif (and missing prefix should be added by maintainers) ------------- Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24180#pullrequestreview-2714925291 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012784742 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012808108 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012812085 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012816294 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012818838 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012819309 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012822001 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012823415 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012826026 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012826486 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012829497 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012853306 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012856105 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012861794 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012862974 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012866552 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012867561 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012867729 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012867935 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012869598 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012868391 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012871019 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012872217 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012885535 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012877987 PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012889539 From stefank at openjdk.org Tue Mar 25 20:52:21 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 25 Mar 2025 20:52:21 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 20:20:55 GMT, Robert Toyonaga wrote: > > ... swapping the order of the release and NMT release booking as a way to shrink the lock scope for the release operation: > > ... > > As long as we hold the reservation, no other thread can re-reserve the reservation, so Thread B can take its time to first perform the NMT release booking under the lock, and then perform the release without the lock. > > Hi @stefank, I think that's true. But if the release/uncommit does not complete successfully we would need to readjust the accounting afterward. To do that we would need to retrieve the original memtag (at least for reserved regions) and potentially need to retrieve the original callsite data (if we're in detailed mode). When does a release/uncommit fail? Would that be a JVM bug? What state is the memory in when such a failure happens? Do we even know if the memory is still committed if an uncommit fails? If I look at the man page for munmap it only fails if you pass in incorrect values, which sounds like a JVM bug to me. I don't understand why we don't treat that as a fatal error *OR* make sure that all call-sites handles that error, which they don't do today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2752513700 From dnsimon at openjdk.org Tue Mar 25 21:22:14 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 25 Mar 2025 21:22:14 GMT Subject: RFR: 8352645: Add tool support to check order of includes In-Reply-To: References: Message-ID: On Sun, 23 Mar 2025 21:14:47 GMT, Doug Simon wrote: > This PR adds `bin/sort_includes.py`, a python3 script to check that blocks of include statements in C++ files are sorted alphabetically and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). > This script can also update files with unsorted includes. The second commit in this PR shows the result of running: > > python3 ./bin/sort_includes.py ./src/hotspot > > To prevent an include being reordered, put at least one non-space character after the closing `"` or `>`. See `src/hotspot/share/adlc/archDesc.cpp` for an example. > > Assuming this PR is integrated, jcheck could be updated to use it to ensure include statements remain sorted. Thanks for all the comments so far. First thing is that my tool does nothing about re-ordering block of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually or someone is going to have to invest significant time in enhancing/replacing the tool I wrote. Also, the tool tries to not change the number of lines in the original file. It should only add blank lines where necessary to separate user includes from sys includes. This explains some of the extra blank lines. For example, if the original was: 1: #include "a.h" 2: 3: #include "b.h" 4: 5: #include 6: 7: #include the output is: 1: #include "a.h" 2: #include "b.h" 3: 4: #include 5: #include 6: 7: Once again, I'd prefer to keep the tool simple and focused on the main task of ordering includes. Cleaning up extraneous blank lines can be done manually after running the tool. I'm currently working on converting `sort_includes.py` to `SortIncludes.java`. Once done, I'll open a second PR and limit changes to the C++ files I'm comfortable with changing and testing (namely in JVMCI directories). I will include a jtreg test to ensure these changes do not regress. Follow up issues can then be opened for working on the remaining C++ files. The main point of this initial PR is to show that such a tool can be useful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24180#issuecomment-2752572125 From dnsimon at openjdk.org Tue Mar 25 21:27:22 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 25 Mar 2025 21:27:22 GMT Subject: RFR: 8352645: Add tool support to check order of includes In-Reply-To: References: Message-ID: On Sun, 23 Mar 2025 21:14:47 GMT, Doug Simon wrote: > This PR adds `bin/sort_includes.py`, a python3 script to check that blocks of include statements in C++ files are sorted alphabetically and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). > This script can also update files with unsorted includes. The second commit in this PR shows the result of running: > > python3 ./bin/sort_includes.py ./src/hotspot > > To prevent an include being reordered, put at least one non-space character after the closing `"` or `>`. See `src/hotspot/share/adlc/archDesc.cpp` for an example. > > Assuming this PR is integrated, jcheck could be updated to use it to ensure include statements remain sorted. bin/sort_includes.py line 117: > 115: for filename in filenames: > 116: file = Path(dirpath).joinpath(filename) > 117: if file.suffix in (".cpp", "hpp"): `"hpp"` -> `".hpp"` This bug explains why there are no modified `*.hpp` files in the PR. And I've discovered that blindly sorting includes in these files (especially `*.inline.hpp` files) causes building to break quickly. This is another reason for the more incremental approach I suggest at https://github.com/openjdk/jdk/pull/24180#issuecomment-2752572125 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2012961564 From ccheung at openjdk.org Tue Mar 25 22:22:19 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 25 Mar 2025 22:22:19 GMT Subject: RFR: 8352579: Refactor CDS legacy optimization for lambda proxy classes [v4] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 23:47:46 GMT, Ioi Lam wrote: >> Since JDK 16, CDS has provided limited optimization for lambda expressions. This has been superseded by JEP 483 and is useful only when `-XX:+AOTClassLinking` is not enabled (which is the case for the default CDS archive, for compatibility reasons). >> >> The "legacy lambda optimization" may eventually be removed. For the time being, we should consolidate the code into a single source code and clearly mark its uses. This way we can avoid confusion with the JEP 483 code for supporting lambdas (and other java.lang.invoke functionalities). > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @matias9927 offline comments - consolidated two functions with identical names Spotted couple of extra include statements. Looks good. src/hotspot/share/cds/lambdaProxyClassDictionary.cpp line 31: > 29: #include "classfile/systemDictionaryShared.hpp" > 30: #include "interpreter/bootstrapInfo.hpp" > 31: #include "jfr/jfrEvents.hpp" Extra include? src/hotspot/share/classfile/systemDictionaryShared.cpp line 34: > 32: #include "cds/classListWriter.hpp" > 33: #include "cds/dumpTimeClassInfo.inline.hpp" > 34: #include "cds/dynamicArchive.hpp" Pre-existing: I think the include of `cds/archiveHeapLoader.hpp` at line #27 is unnecessary. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24145#pullrequestreview-2715327433 PR Review Comment: https://git.openjdk.org/jdk/pull/24145#discussion_r2013010453 PR Review Comment: https://git.openjdk.org/jdk/pull/24145#discussion_r2013014309 From dnsimon at openjdk.org Tue Mar 25 22:22:23 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 25 Mar 2025 22:22:23 GMT Subject: RFR: 8352645: Add tool support to check order of includes In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 20:27:21 GMT, Stefan Karlsson wrote: >> This PR adds `bin/sort_includes.py`, a python3 script to check that blocks of include statements in C++ files are sorted alphabetically and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> This script can also update files with unsorted includes. The second commit in this PR shows the result of running: >> >> python3 ./bin/sort_includes.py ./src/hotspot >> >> To prevent an include being reordered, put at least one non-space character after the closing `"` or `>`. See `src/hotspot/share/adlc/archDesc.cpp` for an example. >> >> Assuming this PR is integrated, jcheck could be updated to use it to ensure include statements remain sorted. > > src/hotspot/share/opto/output.cpp line 39: > >> 37: #include "opto/block.hpp" >> 38: #include "opto/c2_MacroAssembler.hpp" >> 39: #include "opto/c2compiler.hpp" > > Hmm. > > #include "opto/c2_MacroAssembler.hpp" > #include "opto/c2compiler.hpp" > > Here _ comes before lower-case letters. From other places I see that _ comes after upper-case letters. I realize that this is caused by the ASCII value, but it is a bit unfortunate (IMHO). OTOH, maybe this will be consistent (the way I would like it, at least) if the sorting was done on lower-cased strings. I think it's simplest to follow ASCII sorting but don't have a strong opinion if others would prefer for strings to be lower-cased before sorting. The important thing is that the same order is used everywhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2013015247 From duke at openjdk.org Tue Mar 25 22:40:25 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 25 Mar 2025 22:40:25 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v8] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 15:52:07 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Revamp toString() methods src/java.base/share/classes/jdk/internal/lang/stable/EmptyStableFunction.java line 43: > 41: * @param the type of the result of the function > 42: */ > 43: record EmptyStableFunction(Function original) implements Function { With the simplification of the toString methods, I think this could be replaced with a singleton instance of StableFunction that was initialized with an empty map and an arbitrary function (that will never be called). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2013029409 From duke at openjdk.org Tue Mar 25 23:31:22 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 25 Mar 2025 23:31:22 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v8] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 15:52:07 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Revamp toString() methods src/java.base/share/classes/jdk/internal/lang/stable/StableValueFactories.java line 41: > 39: public static Function function(Set inputs, > 40: Function original) { > 41: if (inputs.isEmpty()) { If it is worth optimizing the isEmpty scenario, it might be preferable to let each xxxFunction.of return an appropriate instance, to keep the number of varying subclasses to a minimum. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2013067132 From duke at openjdk.org Tue Mar 25 23:44:16 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 25 Mar 2025 23:44:16 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v8] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 15:52:07 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Revamp toString() methods test/jdk/java/lang/StableValue/StableValueTest.java line 188: > 186: stable.trySet(stable); > 187: String toString = stable.toString(); > 188: assertEquals("(this StableValue)", toString); Instead of checking the toString format, an option would be to use assertDoesNotThrow to ensure there was no StackOverflowException ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2013075831 From fyang at openjdk.org Wed Mar 26 02:24:06 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 26 Mar 2025 02:24:06 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 14:19:55 GMT, Robbin Ehn wrote: > Hi, for you to consider. > > These tests constantly fails in qemu-user. > Either the require host to be same arch or they are very very slow in emulation. > E.g. "ptrace(PTRACE_ATTACH, ..) failed for 405157: Function not implemented'" for SA tests. > This is the initial set of tests, there are many more, but I need to do some more verification for those. > > From bug: >> qemu-user/rv64 sets uarch to "qemu" in /proc/cpuinfo (qemu-system do not do that). >> We add this uarch to CPU feature string. >> This means we can use jtreg 'require' with cpu string to filter out tests in qemu-user. > > Relevant qemu code: > https://github.com/qemu/qemu/blob/170825d14d88a1ce7fae98d5a928480f2f329b22/linux-user/riscv/target_proc.h#L29 > > Relevant hotspot code: > https://github.com/openjdk/jdk/blob/fa0b18bfde38ee2ffbab33a9eaac547fe8aa3c7c/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp#L250 > > Tested that the require only filters out tests in qemu+riscv64. > > Thanks! > > /Robbin Hi, This is interesting! But why not use qemu-system instead then? Although a bit slower than qemu-user, this functions well like a real linux system. When testing new riscv features without hardware implementations, I always use qemu-system with a big timeout factor, like: `make test TEST=hotspot:tier1 JTREG="TIMEOUT_FACTOR=24"`. This seems to work on my Xeon Gold 6278C X86 server. It takes about one day or two to build and run `hotspot:tier1`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24229#issuecomment-2753071048 From dholmes at openjdk.org Wed Mar 26 02:24:12 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 26 Mar 2025 02:24:12 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 15:38:21 GMT, Casper Norrbin wrote: > Hi everyone, > > This PR removes redundant `ResourceMark` instances where `LogStream` is used. Previously, `LogStream` inherited from `ResourceObj`, which required a `ResourceMark`, but this is no longer the case, making these instances unnecessary. > > Process: > 1. I added assertions to check for resource unwinding in places where `ResourceMark`s were used with `LogStream`s. > 2. Ran tests up to tier7 to confirm no unwinding was happening. This helped filter out cases where `ResourceMark`s were still required for other reasons. > 3. Manually verified the remaining cases by tracing function calls to ensure the `ResourceMark`s were truly unnecessary. > 4. Removed the redundant `ResourceMark` instances. Thanks for the additional info @caspernorrbin . Looks good. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24162#pullrequestreview-2715740332 From sspitsyn at openjdk.org Wed Mar 26 03:12:44 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 26 Mar 2025 03:12:44 GMT Subject: RFR: 8352812: remove useless class and function parameter in SuspendThread impl [v2] In-Reply-To: References: Message-ID: > The internal class JvmtiSuspendControl is transitively used in the SuspendThread implementation is not really needed and is being removed. Also, the suspend_thread function has unused need_safepoint_p parameter which is being removed as well. > > Testing: > - TBD: Run mach5 tiers 1-3 to be safe Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: fixed typo caused build time error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24219/files - new: https://git.openjdk.org/jdk/pull/24219/files/8513e69f..daf1735b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24219&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24219&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24219.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24219/head:pull/24219 PR: https://git.openjdk.org/jdk/pull/24219 From sspitsyn at openjdk.org Wed Mar 26 03:44:07 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 26 Mar 2025 03:44:07 GMT Subject: RFR: 8352812: remove useless class and function parameter in SuspendThread impl [v2] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 03:12:44 GMT, Serguei Spitsyn wrote: >> The internal class JvmtiSuspendControl is transitively used in the SuspendThread implementation is not really needed and is being removed. Also, the suspend_thread function has unused need_safepoint_p parameter which is being removed as well. >> >> Testing: >> - TBD: Run mach5 tiers 1-3 to be safe > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > fixed typo caused build time error Leonid and Chris, thank you for review! I've just fixed a typo caused build failures, so a re-review will be needed. Sorry for that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24219#issuecomment-2753162234 From lmesnik at openjdk.org Wed Mar 26 05:35:22 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 26 Mar 2025 05:35:22 GMT Subject: RFR: 8352812: remove useless class and function parameter in SuspendThread impl [v2] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 03:12:44 GMT, Serguei Spitsyn wrote: >> The internal class JvmtiSuspendControl is transitively used in the SuspendThread implementation is not really needed and is being removed. Also, the suspend_thread function has unused need_safepoint_p parameter which is being removed as well. >> >> Testing: >> - TBD: Run mach5 tiers 1-3 to be safe > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > fixed typo caused build time error Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24219#pullrequestreview-2715946424 From rehn at openjdk.org Wed Mar 26 06:24:14 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 26 Mar 2025 06:24:14 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 14:19:55 GMT, Robbin Ehn wrote: > Hi, for you to consider. > > These tests constantly fails in qemu-user. > Either the require host to be same arch or they are very very slow in emulation. > E.g. "ptrace(PTRACE_ATTACH, ..) failed for 405157: Function not implemented'" for SA tests. > This is the initial set of tests, there are many more, but I need to do some more verification for those. > > From bug: >> qemu-user/rv64 sets uarch to "qemu" in /proc/cpuinfo (qemu-system do not do that). >> We add this uarch to CPU feature string. >> This means we can use jtreg 'require' with cpu string to filter out tests in qemu-user. > > Relevant qemu code: > https://github.com/qemu/qemu/blob/170825d14d88a1ce7fae98d5a928480f2f329b22/linux-user/riscv/target_proc.h#L29 > > Relevant hotspot code: > https://github.com/openjdk/jdk/blob/fa0b18bfde38ee2ffbab33a9eaac547fe8aa3c7c/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp#L250 > > Tested that the require only filters out tests in qemu+riscv64. > > Thanks! > > /Robbin Yes, it's much solver as you can't use a compile jdk. Note that is not a full tier1, "make test-tier1" Which means you have a day or two more for a full tier1. There are plenty of test in the jdk which stress or correctness test runtime/compiler, e.g.: test/jdk/java/lang/Thread/virtual/MonitorWaitNotify.java test/jdk/java/lang/Float/Binary16Conversion.java `make test-tier1 CONF=linux-riscv64-server-fastdebug JTREG="OPTIONS=-e:QEMU_LD_PREFIX=/usr/riscv64-linux-gnu/;JAVA_OPTIONS=;RETAIN=all" JDK_FOR_COMPILE=/home/rehn/source/jdk/vanilla/build/linux-x86_64-server-release/images/jdk JTREG_TIMEOUT_FACTOR=20` One issue with high timeout factor is that make+jtreg only can parallelize tests in the same directory. Which means you often end up with just waiting for one test to complete before anything else can happen. Hotspot tier1 takes around 8h with timeout factor 20 for me in qemu-user, cpu = max, but tests still timeout. Even when running JOBS=1 some timeout, at least on cpu = max. Hence this enhancement :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24229#issuecomment-2753357500 From stefank at openjdk.org Wed Mar 26 06:50:19 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Mar 2025 06:50:19 GMT Subject: RFR: 8352645: Add tool support to check order of includes In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 21:24:41 GMT, Doug Simon wrote: >> This PR adds `bin/sort_includes.py`, a python3 script to check that blocks of include statements in C++ files are sorted alphabetically and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> This script can also update files with unsorted includes. The second commit in this PR shows the result of running: >> >> python3 ./bin/sort_includes.py ./src/hotspot >> >> To prevent an include being reordered, put at least one non-space character after the closing `"` or `>`. See `src/hotspot/share/adlc/archDesc.cpp` for an example. >> >> Assuming this PR is integrated, jcheck could be updated to use it to ensure include statements remain sorted. > > bin/sort_includes.py line 117: > >> 115: for filename in filenames: >> 116: file = Path(dirpath).joinpath(filename) >> 117: if file.suffix in (".cpp", "hpp"): > > `"hpp"` -> `".hpp"` > > This bug explains why there are no modified `*.hpp` files in the PR. And I've discovered that blindly sorting includes in these files (especially `*.inline.hpp` files) causes building to break quickly. This is another reason for the more incremental approach I suggest at https://github.com/openjdk/jdk/pull/24180#issuecomment-2752572125 I was going to ask that, but forgot about it once I had looked through all the files. :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2013489315 From stefank at openjdk.org Wed Mar 26 07:03:18 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Mar 2025 07:03:18 GMT Subject: RFR: 8352645: Add tool support to check order of includes In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 21:19:03 GMT, Doug Simon wrote: > Thanks for all the comments so far. > > First thing is that my tool does nothing about re-ordering block of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually or someone is going to have to invest significant time in enhancing/replacing the tool I wrote. Yup. The extra effort needed to get the tool fully solve this is the reason why we haven't built a tool for this. There are a few scripts around but none is good enough to be promoted as *the* tool to correctly fix our includes. It is still going to be a great tool for the devs. > > Also, the tool tries to not change the number of lines in the original file. It should only add blank lines where necessary to separate user includes from sys includes. This explains some of the extra blank lines. For example, if the original was: > > ``` > 1: #include "a.h" > 2: > 3: #include "b.h" > 4: > 5: #include > 6: > 7: #include > ``` > > the output is: > > ``` > 1: #include "a.h" > 2: #include "b.h" > 3: > 4: #include > 5: #include > 6: > 7: > ``` > > Once again, I'd prefer to keep the tool simple and focused on the main task of ordering includes. Cleaning up extraneous blank lines can be done manually after running the tool. OK. We'll just have to make sure to clean these out after having the initial run of this tool. > > I'm currently working on converting `sort_includes.py` to `SortIncludes.java`. Once done, I'll open a second PR and limit changes to the C++ files I'm comfortable with changing and testing (namely in JVMCI directories). I will include a jtreg test to ensure these changes do not regress. > > Follow up issues can then be opened for working on the remaining C++ files. The main point of this initial PR is to show that such a tool can be useful. Sounds good to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24180#issuecomment-2753413905 From dholmes at openjdk.org Wed Mar 26 07:08:07 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 26 Mar 2025 07:08:07 GMT Subject: RFR: 8352766: Problemlist hotspot tier1 tests requiring tools that are not included in static JDK In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 03:02:32 GMT, Jiangli Zhou wrote: > Please review this change that adds test/hotspot/jtreg/ProblemList-StaticJdk.txt, which problemlists 27 hotspot tier1 tests that use `javac`, `jstack`, `jcmd` and `jhsdb` at runtime. > > Following is an example of the command that I use to run hotspot tier1 tests on static JDK with the extra `ProblemList-StaticJdk.txt`: > > > $ make test TEST="test/hotspot/jtreg:tier1" JDK_UNDER_TEST=//JDK-8352766/build/linux-x86_64-server-fastdebug/images/static-jdk JDK_FOR_COMPILE=//JDK-8352766/build/linux-x86_64-server-fastdebug/images/jdk JTREG="EXTRA_PROBLEM_LISTS=//JDK-8352766/test/hotspot/jtreg/ProblemList-StaticJdk.txt" Seems fine. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24214#pullrequestreview-2716087071 From jsjolen at openjdk.org Wed Mar 26 07:41:12 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 26 Mar 2025 07:41:12 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 15:38:21 GMT, Casper Norrbin wrote: > Hi everyone, > > This PR removes redundant `ResourceMark` instances where `LogStream` is used. Previously, `LogStream` inherited from `ResourceObj`, which required a `ResourceMark`, but this is no longer the case, making these instances unnecessary. > > Process: > 1. I added assertions to check for resource unwinding in places where `ResourceMark`s were used with `LogStream`s. > 2. Ran tests up to tier7 to confirm no unwinding was happening. This helped filter out cases where `ResourceMark`s were still required for other reasons. > 3. Manually verified the remaining cases by tracing function calls to ensure the `ResourceMark`s were truly unnecessary. > 4. Removed the redundant `ResourceMark` instances. OK with this ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24162#pullrequestreview-2716150401 From rehn at openjdk.org Wed Mar 26 07:47:28 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 26 Mar 2025 07:47:28 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v7] In-Reply-To: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: <-lZb8RCR5B68I7gHZaaevk7ls1Ms15QwUjJnANUK7UE=.0fb9aba2-5a82-4243-bc01-1de28529485c@github.com> > Hi please consider. > > |RVWMO| Patched| > | ---------- | ---------- | > |fence iorw,iorw| fence iorw,ow| > |sw t4,120(t2) | sw t4,120(t2) | > |fence ow,ir | unnecessary_membar_volatile_rvwmo | > | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | > |fence iorw,ow | fence iorw,ow| > |sw t5,124(t2) |sw t5,124(t2) | > > |TSO | Patched| > | ---------- | ---------- | > | lw a4,120(t2) | lw a6,120(t2) | > | sw a0,124(t2) | sw t6,124(t2) | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > | sw t4,120(t2) | sw t4,120(t2) | > | fence ow,ir | unnecessary_membar_volatile_tso | > | sw t6,128(t2) | sw t5,128(t2) | > | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > |... | ... | > | sw a3,120(t2) | sw a0,120(t2) | > | fence ow,ir | fence ow,ir | > | lw a7,124(t2) | lw a5,124(t2) | > > For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. > > The patch do: > - Separate ztso and rvwmo in ad by using UseZtso predicate. > - Match all that requires the same membar. > - Make fence/fencei protected as they shouldn't be using directly. > - Increased cost of membars to VOLATILE_REF_COST. > - Added a real_empty pipe. > - Change to pipe_slow on TSO (as x86). > > Note that C2-rv64 is now superior to gcc/clang regrading fencing: > https://godbolt.org/z/6E3YTP15j > > Testing jcstress, tier1 and manually reading the generated assembly. > Doing additional testing, but RFR it now as it may need some consideration. > > /Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'master' into tso-merge - format comment - Merge branch 'master' into tso-merge - Review comments - Merge branch 'master' into tso-merge - Review comments - Fixed ws - Revert NC - Fixed comment - UseNewCode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24035/files - new: https://git.openjdk.org/jdk/pull/24035/files/cb184209..5eac8470 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=05-06 Stats: 22948 lines in 1583 files changed: 12439 ins; 4022 del; 6487 mod Patch: https://git.openjdk.org/jdk/pull/24035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24035/head:pull/24035 PR: https://git.openjdk.org/jdk/pull/24035 From rehn at openjdk.org Wed Mar 26 07:47:32 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 26 Mar 2025 07:47:32 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v5] In-Reply-To: References: Message-ID: > Hi please consider. > > Added case to turn off UseZvfh when no RVV. > Which is the cause of the test issues, zvfh on but no rvv. > > Also made all case identical and added no warning when default. > Move them to the common init, as the "UseExtension" is not C2 specific. > > Manual tested and some random compiler tests. > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Spell fix - Merge branch 'master' into maxvector_0 - dep check - Merge branch 'master' into maxvector_0 - Merge branch 'master' into maxvector_0 - Merge branch 'master' into maxvector_0 - hwprobe deps - Merge branch 'master' into maxvector_0 - Moved to common - Disable UseZvfh when no RVV ------------- Changes: https://git.openjdk.org/jdk/pull/24094/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24094&range=04 Stats: 106 lines in 2 files changed: 45 ins; 19 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/24094.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24094/head:pull/24094 PR: https://git.openjdk.org/jdk/pull/24094 From rehn at openjdk.org Wed Mar 26 07:47:33 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 26 Mar 2025 07:47:33 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v4] In-Reply-To: References: <3offilavnorMFRTRJK8oCgc4VkWQ-tbqka-HbqvcLjs=.9e76929a-4b7f-4c63-b662-f548fa3f9ec0@github.com> Message-ID: On Tue, 25 Mar 2025 02:38:33 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - dep check >> - Merge branch 'master' into maxvector_0 >> - Merge branch 'master' into maxvector_0 >> - Merge branch 'master' into maxvector_0 >> - hwprobe deps >> - Merge branch 'master' into maxvector_0 >> - Moved to common >> - Disable UseZvfh when no RVV > > src/hotspot/cpu/riscv/vm_version_riscv.hpp line 161: > >> 159: #define RV_NO_FLAG_BIT (BitsPerWord+1) // nth_bit will return 0 on values larger than BitsPerWord >> 160: >> 161: // Note: the order matters, depender should be after thier dependee. E.g. ext_V before ext_Zvbb. > > Noticed a typo here: s/thier/their/ Thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24094#discussion_r2013549890 From stefank at openjdk.org Wed Mar 26 08:04:14 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Mar 2025 08:04:14 GMT Subject: RFR: 8352645: Add tool support to check order of includes In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 22:19:39 GMT, Doug Simon wrote: >> src/hotspot/share/opto/output.cpp line 39: >> >>> 37: #include "opto/block.hpp" >>> 38: #include "opto/c2_MacroAssembler.hpp" >>> 39: #include "opto/c2compiler.hpp" >> >> Hmm. >> >> #include "opto/c2_MacroAssembler.hpp" >> #include "opto/c2compiler.hpp" >> >> Here _ comes before lower-case letters. From other places I see that _ comes after upper-case letters. I realize that this is caused by the ASCII value, but it is a bit unfortunate (IMHO). OTOH, maybe this will be consistent (the way I would like it, at least) if the sorting was done on lower-cased strings. > > I think it's simplest to follow ASCII sorting but don't have a strong opinion if others would prefer for strings to be lower-cased before sorting. The important thing is that the same order is used everywhere. I register my vote for using lower-casing the strings before sorting. Note that lower-casing before sorting retain the current prevailing sort-order of sorting `_` before the letters. This is also what you get if you use java's `String.CASE_INSENSITIVE_ORDER` (but not if you use `sort -f`, which upper-cases the strings). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24180#discussion_r2013575848 From jsjolen at openjdk.org Wed Mar 26 08:48:13 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 26 Mar 2025 08:48:13 GMT Subject: RFR: 8352645: Add tool support to check order of includes In-Reply-To: References: Message-ID: On Sun, 23 Mar 2025 21:14:47 GMT, Doug Simon wrote: > This PR adds `bin/sort_includes.py`, a python3 script to check that blocks of include statements in C++ files are sorted alphabetically and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). > This script can also update files with unsorted includes. The second commit in this PR shows the result of running: > > python3 ./bin/sort_includes.py ./src/hotspot > > To prevent an include being reordered, put at least one non-space character after the closing `"` or `>`. See `src/hotspot/share/adlc/archDesc.cpp` for an example. > > Assuming this PR is integrated, jcheck could be updated to use it to ensure include statements remain sorted. No review from me (though happy to review the Java rewrite), but thank you for doing this :-). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24180#issuecomment-2753621032 From fyang at openjdk.org Wed Mar 26 09:04:15 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 26 Mar 2025 09:04:15 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v5] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 07:47:32 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> Added case to turn off UseZvfh when no RVV. >> Which is the cause of the test issues, zvfh on but no rvv. >> >> Also made all case identical and added no warning when default. >> Move them to the common init, as the "UseExtension" is not C2 specific. >> >> Manual tested and some random compiler tests. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Spell fix > - Merge branch 'master' into maxvector_0 > - dep check > - Merge branch 'master' into maxvector_0 > - Merge branch 'master' into maxvector_0 > - Merge branch 'master' into maxvector_0 > - hwprobe deps > - Merge branch 'master' into maxvector_0 > - Moved to common > - Disable UseZvfh when no RVV Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24094#pullrequestreview-2716351952 From mbaesken at openjdk.org Wed Mar 26 09:13:22 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 26 Mar 2025 09:13:22 GMT Subject: RFR: 8351491: Add info from release file to hserr file Message-ID: The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. SOURCE=".:git:21af8c7e7405" Also the MODULES list is probably useful to have. Add this info (or the complete content of the release file) to the hs_err files. ------------- Commit messages: - JDK-8351491 Changes: https://git.openjdk.org/jdk/pull/24244/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351491 Stats: 61 lines in 4 files changed: 61 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24244/head:pull/24244 PR: https://git.openjdk.org/jdk/pull/24244 From shade at openjdk.org Wed Mar 26 09:26:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 09:26:22 GMT Subject: RFR: 8345169: Implement JEP 503: Remove the 32-bit x86 Port [v3] In-Reply-To: References: Message-ID: <38zw9WI_zW70F66Y44GWS6c5fXWHY0tBXmrnUqo7g3k=.e5d35577-fd80-44db-88bf-523e9f982ffc@github.com> On Tue, 25 Mar 2025 10:43:32 GMT, Aleksey Shipilev wrote: >> This PR implements JEP 503: Remove the 32-bit x86 Port. >> >> The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. >> >> This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. >> >> The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. >> >> Additional testing: >> - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) >> - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) >> - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Drop commented out block from deprecations > - Merge branch 'master' into JDK-8345169-32bit-x86-be-gone > - Generic 32-bit x86 configure error supercedes Windows 32-bit x86 > - 8345169: Implement JEP 503: Remove the 32-bit x86 Port There we go! Thanks all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23906#issuecomment-2753715713 From shade at openjdk.org Wed Mar 26 09:26:23 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 09:26:23 GMT Subject: Integrated: 8345169: Implement JEP 503: Remove the 32-bit x86 Port In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:52:16 GMT, Aleksey Shipilev wrote: > This PR implements JEP 503: Remove the 32-bit x86 Port. > > The JEP is proposed to target 25, we would not integrate until JEP is ready. Reviews are appreciated meanwhile. > > This is only the removal of obvious 32-bit x86 parts, mostly files with `x86_32` in their name. Those are only built when build system knows we are compiling for x86_32. There is therefore no impact on x86_64. The approach for removing x86_32 files only also makes this PR borderline trivial, and requires no additional testing beyond normal pre-integration checks. > > The rest of the code is quite heavily intertwined with x86_64 and/or Zero, and would require accurate untangling. It would be much easier to review and test once we purge the free-standing parts of 32-bit x86 port, which is also a bulk of the port. The tangling with 32-bit x86 Zero is also why I did not touch most of the build system paths that handle x86. There is [JDK-8351148](https://bugs.openjdk.org/browse/JDK-8351148) umbrella that tracks further cleanup work. One can peek the final state that can be reached with all the cleanups in my earlier exploratory https://github.com/openjdk/jdk/pull/22567. > > Additional testing: > - [x] Linux x86_32 Server fastdebug, `make bootcycle-images` (now fails configure) > - [x] Linux x86_64 Server fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_32 Zero fastdebug, `make bootcycle-images` (still works) > - [x] Linux x86_64 Zero fastdebug, `make bootcycle-images` (still works) This pull request has now been integrated. Changeset: ee710fec Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/ee710fec21c4e886769576c17ad6db2ab91a84b4 Stats: 29733 lines in 25 files changed: 4 ins; 29728 del; 1 mod 8345169: Implement JEP 503: Remove the 32-bit x86 Port Reviewed-by: ihse, mdoerr, vlivanov, kvn, coleenp, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/23906 From duke at openjdk.org Wed Mar 26 09:33:04 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 26 Mar 2025 09:33:04 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off Message-ID: # Issue Summary When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. # Change Summary Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. Concretel, this PR - adds parse predicate nodes to the IR testing framework, - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, - adds a regression test. # Testing The changes passed the following testing: - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) - tier1 through tier3 and Oracle internal testing ------------- Commit messages: - Make conditions on UseProfiledLoopPredicate first test UseLoopPredicate - Turn off UseProfiledLoopPredicate when UseLoopPredicate is turned off - Add regression IR test - ir-framework: add parse predicate nodes Changes: https://git.openjdk.org/jdk/pull/24248/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347449 Stats: 133 lines in 6 files changed: 125 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24248/head:pull/24248 PR: https://git.openjdk.org/jdk/pull/24248 From azafari at openjdk.org Wed Mar 26 09:36:14 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 26 Mar 2025 09:36:14 GMT Subject: RFR: 8352140: UBSAN: fix the left shift of negative value in klass.hpp, array_layout_helper() In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 15:36:20 GMT, Kim Barrett wrote: >> For my own learning: >> When developers use left-shift for doubling a value, then a negative operand may changed to a positive since the sign-bit may change. For example in >> >> signed short int x = -32768; >> signed short int y = x << 1; >> ``` >> the value of `y` would be `0`. So, when the left-shift is used as an arithmetic op, both the sign and size of the result/operand should be carefully considered. And, this is not dependent on C++xx. >> So, left-shift of negative value is UB, until the developer explicitly decides on the type of the operand or the result. > > signed short int x = -32768; > signed short int y = x << 1; > > > That does seem like an interestingly weird case. Unless I'm missing something, > there's no UB-overflow in that. The shift expression promotes `short x` to > `int x`, sign extending it. The `int`-typed shift is fine (since C++20, and > effectively so prior to that in non-constexpr-required contexts - see below). > And the implicit conversion to `short y` is implementation-defined (before > C++20, though gcc may warn (-Woverflow)) or fine (since C++20). > > gcc warns about x being negative in C++11 to C++17 modes > (-Wshift-negative-value enabled by default), but doesn't treat it as UB. > Before C++20 gcc errors (warns if -fpermissive) if it's in a > required-constexpr-context, even if -Wshift-negative-value is disabled. > That all seems consistent. I had to emphasize that the case shown in the example may happen at run-time where compiler has no chance to warn/avoid/address it. My concern is that developers should not rely on the compiler to check the validation of left-shift op. They should be aware of the `signed` <-> `unsigned` conversions during the left-shift. To find invalid cases of left-shift, UBSAN instruments them with assertions to catch them at run-time. If the assertion raised, good we found the problem. However, if no assertion raised for some left-shift ops, it doesn't mean that they are valid. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24184#discussion_r2013721972 From mdoerr at openjdk.org Wed Mar 26 10:16:25 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Mar 2025 10:16:25 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp [v5] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 08:32:03 GMT, Matthias Baesken wrote: >> There are a few divisions by zero in sharedRuntimeTrans.cpp, used to "construct" NaN and -infinity. This should probably be replaced by using functionality from std::numeric_limits . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust check LGTM. src/hotspot/share/runtime/sharedRuntimeTrans.cpp line 129: > 127: if (((hx&0x7fffffff)|lx)==0) > 128: return -std::numeric_limits::infinity(); /* log(+-0)=-inf */ > 129: if (hx<0) return std::numeric_limits::quiet_NaN(); /* log(-#) = NaN */ Maybe improve the indentation of the comments? src/hotspot/share/runtime/sharedRuntimeTrans.cpp line 225: > 223: if (((hx&0x7fffffff)|lx)==0) > 224: return -std::numeric_limits::infinity(); /* log(+-0)=-inf */ > 225: if (hx<0) return std::numeric_limits::quiet_NaN(); /* log(-#) = NaN */ Same here. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24136#pullrequestreview-2716583280 PR Review Comment: https://git.openjdk.org/jdk/pull/24136#discussion_r2013803844 PR Review Comment: https://git.openjdk.org/jdk/pull/24136#discussion_r2013804225 From egahlin at openjdk.org Wed Mar 26 10:19:21 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 26 Mar 2025 10:19:21 GMT Subject: RFR: 8352738: JFR: Implementation of JFR Method Timing and Tracing Message-ID: Could I have a review of this enhancement that will add tracing capabilities to JFR? There are opportunities for performance improvements in the implementation, but I would rather add them later and separately. Testing: tier 1-3, test/jdk/jdk/jfr Thanks Erik ------------- Commit messages: - Initial Changes: https://git.openjdk.org/jdk/pull/24205/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24205&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352738 Stats: 6510 lines in 103 files changed: 6060 ins; 362 del; 88 mod Patch: https://git.openjdk.org/jdk/pull/24205.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24205/head:pull/24205 PR: https://git.openjdk.org/jdk/pull/24205 From alanb at openjdk.org Wed Mar 26 10:23:13 2025 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 26 Mar 2025 10:23:13 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 06:21:31 GMT, Robbin Ehn wrote: > One issue with high timeout factor is that make+jtreg only can parallelize tests in the same directory. Which means you often end up with just waiting for one test to complete before anything else can happen. jtreg doesn't require tests that run concurrently with others to be in the same directory. The inverse, where exclusiveAccess.dirs prevents tests in a directory/tree from running at the same time as other tests in that directory/tree also doesn't prevent tests in other locations from executing concurrently. Given the execution times, I wonder if you've looked at using the finer grain test groups and splitting the execution across a number of machines. Yes, it means combing results but I assume you'll this for high tiers anyway as the execution time goes up significantly beyond tier1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24229#issuecomment-2753896771 From alanb at openjdk.org Wed Mar 26 10:27:12 2025 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 26 Mar 2025 10:27:12 GMT Subject: RFR: 8352738: JFR: Implementation of JFR Method Timing and Tracing In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 17:25:01 GMT, Erik Gahlin wrote: > Could I have a review of this enhancement that will add tracing capabilities to JFR? There are opportunities for performance improvements in the implementation, but I would rather add them later and separately. > > Testing: tier 1-3, test/jdk/jdk/jfr > > Thanks > Erik Are the suboptions/parameters specified to -XX:StartFlightRecording a supported interface that is tracked by the CSR? I'm just wondering about the method-timing review specified to report-on-exit for example. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24205#issuecomment-2753909790 From tschatzl at openjdk.org Wed Mar 26 10:37:50 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Mar 2025 10:37:50 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v28] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 36 commits: - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq - * make young gen length revising independent of refinement thread * use a service task * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update - * fix IR code generation tests that change due to barrier cost changes - * factor out card table and refinement table merging into a single method - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 - * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. - * more documentation on why we need to rendezvous the gc threads - Merge branch 'master' into 8342381-card-table-instead-of-dcq - * ayang review * re-add STS leaver for java thread handshake - ... and 26 more: https://git.openjdk.org/jdk/compare/059f190f...6d574da0 ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=27 Stats: 7089 lines in 110 files changed: 2610 ins; 3555 del; 924 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From dnsimon at openjdk.org Wed Mar 26 10:43:25 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 26 Mar 2025 10:43:25 GMT Subject: Withdrawn: 8352645: Add tool support to check order of includes In-Reply-To: References: Message-ID: On Sun, 23 Mar 2025 21:14:47 GMT, Doug Simon wrote: > This PR adds `bin/sort_includes.py`, a python3 script to check that blocks of include statements in C++ files are sorted alphabetically and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). > This script can also update files with unsorted includes. The second commit in this PR shows the result of running: > > python3 ./bin/sort_includes.py ./src/hotspot > > To prevent an include being reordered, put at least one non-space character after the closing `"` or `>`. See `src/hotspot/share/adlc/archDesc.cpp` for an example. > > Assuming this PR is integrated, jcheck could be updated to use it to ensure include statements remain sorted. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24180 From dnsimon at openjdk.org Wed Mar 26 10:43:43 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 26 Mar 2025 10:43:43 GMT Subject: RFR: 8352645: Add tool support to check order of includes Message-ID: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) at java.base/java.lang.Thread.run(Thread.java:1447) Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Optimizer.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.hpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_FrameMap.hpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_RangeCheckElimination.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_InstructionPrinter.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci/bcEscapeAnalyzer.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci/ciInstance.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci/ciEnv.hpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci/ciUtilities.inline.hpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci/ciMethod.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci/ciUtilities.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci/ciEnv.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci/ciCallSite.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci/bcEscapeAnalyzer.hpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci/ciReplay.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci/ciInstanceKlass.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler/compilationMemoryStatistic.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler/compilationFailureInfo.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler/compilationPolicy.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler/directivesParser.hpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler/compileBroker.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler/directivesParser.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler/compilerDirectives.hpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler/methodMatcher.hpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler/compilationMemoryStatistic.hpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler/compileTask.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler/disassembler.hpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler/oopMap.inline.hpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci/jvmciRuntime.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci/jvmci.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci/jvmciCompiler.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci/jvmci.hpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci/jvmciEnv.cpp /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci/jvmciJavaClasses.cpp Note that non-space characters after the closing " or > of an include statement can be used to prevent re-ordering of the include. For example: #include "e.hpp" #include "d.hpp" #include "c.hpp" // do not reorder #include "b.hpp" #include "a.hpp" will be reformatted as: #include "d.hpp" #include "e.hpp" #include "c.hpp" // do not reorder #include "a.hpp" #include "b.hpp" at SortIncludes.main(SortIncludes.java:190) at TestIncludesAreSorted.main(TestIncludesAreSorted.java:75) ... 4 more JavaTest Message: Test threw exception: java.lang.RuntimeException This PR includes a [commit](https://github.com/openjdk/jdk/pull/24247/commits/a76d4f98c7e6074b4745c1c1791fe605e352d79f) with ordering suppression comments for some files I discovered needed it while playing around in #24180 . This PR replaces #24180. ------------- Commit messages: - sort includes in subset of HotSpot sources and added a test to keep them sorted - added tool to sort includes - do not reorder certain includes Changes: https://git.openjdk.org/jdk/pull/24247/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24247&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352645 Stats: 396 lines in 53 files changed: 335 ins; 54 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24247/head:pull/24247 PR: https://git.openjdk.org/jdk/pull/24247 From dnsimon at openjdk.org Wed Mar 26 10:43:25 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 26 Mar 2025 10:43:25 GMT Subject: RFR: 8352645: Add tool support to check order of includes In-Reply-To: References: Message-ID: On Sun, 23 Mar 2025 21:14:47 GMT, Doug Simon wrote: > This PR adds `bin/sort_includes.py`, a python3 script to check that blocks of include statements in C++ files are sorted alphabetically and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). > This script can also update files with unsorted includes. The second commit in this PR shows the result of running: > > python3 ./bin/sort_includes.py ./src/hotspot > > To prevent an include being reordered, put at least one non-space character after the closing `"` or `>`. See `src/hotspot/share/adlc/archDesc.cpp` for an example. > > Assuming this PR is integrated, jcheck could be updated to use it to ensure include statements remain sorted. I've created #24247 to replace this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24180#issuecomment-2753968509 From ihse at openjdk.org Wed Mar 26 10:49:18 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 26 Mar 2025 10:49:18 GMT Subject: RFR: 8352766: Problemlist hotspot tier1 tests requiring tools that are not included in static JDK In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 03:02:32 GMT, Jiangli Zhou wrote: > Please review this change that adds test/hotspot/jtreg/ProblemList-StaticJdk.txt, which problemlists 27 hotspot tier1 tests that use `javac`, `jstack`, `jcmd` and `jhsdb` at runtime. > > Following is an example of the command that I use to run hotspot tier1 tests on static JDK with the extra `ProblemList-StaticJdk.txt`: > > > $ make test TEST="test/hotspot/jtreg:tier1" JDK_UNDER_TEST=//JDK-8352766/build/linux-x86_64-server-fastdebug/images/static-jdk JDK_FOR_COMPILE=//JDK-8352766/build/linux-x86_64-server-fastdebug/images/jdk JTREG="EXTRA_PROBLEM_LISTS=//JDK-8352766/test/hotspot/jtreg/ProblemList-StaticJdk.txt" Marked as reviewed by ihse (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24214#pullrequestreview-2716699025 From shade at openjdk.org Wed Mar 26 11:35:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 11:35:25 GMT Subject: RFR: 8351155: C1/C2: Remove 32-bit x86 specific FP rounding support Message-ID: <8zUrV-sMSOwRSQk_jERtFqjrzOFUP7rlUwTTN7cPP_8=.b1d30fb5-d9c0-4417-bacd-bf09f2af433b@github.com> C1 and C2 have support for rounding double/floats, to support awkward rounding modes of x87 FPU. With 32-bit x86 port removed, we can remove those parts. This basically deletes all the code that uses `strict_fp_requires_explicit_rounding`, which is now universally `false` for all supported platforms. For C1, we remove `RoundFP` op, its associated `lir_roundfp` and related utility methods that insert these nodes in the graph. For C2, we remove `RoundDouble` and `RoundFloat` nodes (note there is a confusingly named `RoundDoubleMode` nodes that are not related to this), associated utility methods, AD match rules that reference these nodes (as nops!), and some `Ideal`-s that are no longer needed. Additional testing: - [x] Linux x86_64 server fastdebug, `tier1` - [ ] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Leftover - Fix Changes: https://git.openjdk.org/jdk/pull/24250/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24250&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351155 Stats: 545 lines in 48 files changed: 0 ins; 511 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/24250.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24250/head:pull/24250 PR: https://git.openjdk.org/jdk/pull/24250 From rehn at openjdk.org Wed Mar 26 12:17:08 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 26 Mar 2025 12:17:08 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 10:20:19 GMT, Alan Bateman wrote: > > One issue with high timeout factor is that make+jtreg only can parallelize tests in the same directory. Which means you often end up with just waiting for one test to complete before anything else can happen. > > jtreg doesn't require tests that run concurrently with others to be in the same directory. The inverse, where exclusiveAccess.dirs prevents tests in a directory/tree from running at the same time as other tests in that directory/tree also doesn't prevent tests in other locations from executing concurrently. > Sorry, I didn't mean top directory. I meant 'test root dir'/test sub groups (as they usually maps). I can't force the issue.. it seems to work fine... But plenty of times the machine have been running just one test and that timeouts it starts a whole bunch of new tests from another test group. Is this recently fixed, or what may be the issue? E.g. +robbin = \ + compiler/c2/irTests \ + runtime/handshake It runs tests from both groups in parallel, which is not what I have been seeing? > Given the execution times, I wonder if you've looked at using the finer grain test groups and splitting the execution across a number of machines. Yes, it means combing results but I assume you'll this for high tiers anyway as the execution time goes up significantly beyond tier1. Running on my workstation with qemu-user or a small rv64 board there is around 50x time vs x86 for me. It helps, but requires a bunch of machines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24229#issuecomment-2754202998 From stefank at openjdk.org Wed Mar 26 12:27:09 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Mar 2025 12:27:09 GMT Subject: RFR: 8352645: Add tool support to check order of includes In-Reply-To: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Wed, 26 Mar 2025 09:21:59 GMT, Doug Simon wrote: > This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). > > By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. > > The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. > > I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. > > When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: > > java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: > > java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci > > at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) > at java.base/java.lang.Thread.run(Thread.java:1447) > Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: > > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Optim... Thanks for updating to use the lower-case comparison. I wonder if a small tweak can fix the extra blank lines I complained about in the other PR. The tool removes the extra blank line we have in our .inline.hpp. From the Style Guide: All .inline.hpp files should include their corresponding .hpp file as the first include line with a blank line separating it from the rest of the include lines. Declarations needed by other files should be put in the .hpp file, and not in the .inline.hpp file. This rule exists to resolve problems with circular dependencies between .inline.hpp files. I think this needs to be fixed, otherwise people will start to remove these. src/hotspot/share/compiler/oopMap.inline.hpp line 29: > 27: > 28: #include "compiler/oopMap.hpp" > 29: This blank line should not be removed. test/hotspot/jtreg/sources/SortIncludes.java line 77: > 75: blankLines = List.of(""); > 76: } > 77: result.addAll(blankLines); If this line is removed you don't get the extra blank lines I mentioned in the previous PR. It also removes the extra blank line that you get inserted into oopMap.inline.hpp before the INCLUDE_JVMCI block. ------------- Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24247#pullrequestreview-2716954567 PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2014026694 PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2014025793 From rehn at openjdk.org Wed Mar 26 12:28:11 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 26 Mar 2025 12:28:11 GMT Subject: RFR: 8352218: RISC-V: Zvfh requires RVV [v5] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 07:47:32 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> Added case to turn off UseZvfh when no RVV. >> Which is the cause of the test issues, zvfh on but no rvv. >> >> Also made all case identical and added no warning when default. >> Move them to the common init, as the "UseExtension" is not C2 specific. >> >> Manual tested and some random compiler tests. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Spell fix > - Merge branch 'master' into maxvector_0 > - dep check > - Merge branch 'master' into maxvector_0 > - Merge branch 'master' into maxvector_0 > - Merge branch 'master' into maxvector_0 > - hwprobe deps > - Merge branch 'master' into maxvector_0 > - Moved to common > - Disable UseZvfh when no RVV Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24094#issuecomment-2754230289 From mbaesken at openjdk.org Wed Mar 26 12:34:21 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 26 Mar 2025 12:34:21 GMT Subject: Integrated: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 15:56:10 GMT, Matthias Baesken wrote: > There are a few divisions by zero in sharedRuntimeTrans.cpp, used to "construct" NaN and -infinity. This should probably be replaced by using functionality from std::numeric_limits . This pull request has now been integrated. Changeset: b4dc3645 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/b4dc364575b5a7e9dab5645f2fd6f377083531f0 Stats: 27 lines in 2 files changed: 6 ins; 12 del; 9 mod 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp Reviewed-by: kbarrett, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/24136 From mbaesken at openjdk.org Wed Mar 26 12:34:21 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 26 Mar 2025 12:34:21 GMT Subject: RFR: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp [v5] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 08:32:03 GMT, Matthias Baesken wrote: >> There are a few divisions by zero in sharedRuntimeTrans.cpp, used to "construct" NaN and -infinity. This should probably be replaced by using functionality from std::numeric_limits . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust check Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24136#issuecomment-2754243239 From duke at openjdk.org Wed Mar 26 13:39:15 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 26 Mar 2025 13:39:15 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v2] In-Reply-To: References: Message-ID: > # Issue Summary > > When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. > > # Change Summary > > Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. > > Concretel, this PR > - adds parse predicate nodes to the IR testing framework, > - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, > - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, > - adds a regression test. > > > # Testing > > The changes passed the following testing: > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) > - tier1 through tier3 and Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: ir-framework: fix phase for parse predicate nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24248/files - new: https://git.openjdk.org/jdk/pull/24248/files/d4885dec..08afa3d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24248/head:pull/24248 PR: https://git.openjdk.org/jdk/pull/24248 From stefank at openjdk.org Wed Mar 26 13:43:16 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Mar 2025 13:43:16 GMT Subject: RFR: 8352645: Add tool support to check order of includes In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Wed, 26 Mar 2025 12:19:14 GMT, Stefan Karlsson wrote: >> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> >> By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. >> >> The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. >> >> I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. >> >> When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: >> >> java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: >> >> java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci >> >> at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) >> Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: >> >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp >> /Users/dnsimo... > > test/hotspot/jtreg/sources/SortIncludes.java line 77: > >> 75: blankLines = List.of(""); >> 76: } >> 77: result.addAll(blankLines); > > If this line is removed you don't get the extra blank lines I mentioned in the previous PR. It also removes the extra blank line that you get inserted into oopMap.inline.hpp before the INCLUDE_JVMCI block. Or, rather if the code is changed to: if (!userIncludes.isEmpty() && !sysIncludes.isEmpty()) { result.add(""); } result.addAll(sysIncludes); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2014172537 From dchuyko at openjdk.org Wed Mar 26 13:52:29 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Wed, 26 Mar 2025 13:52:29 GMT Subject: RFR: 8337666: AArch64: SHA3 GPR intrinsic In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 14:38:12 GMT, Dmitry Chuyko wrote: > This is an implementation of SHA3 intrinsics for AArch64 that operates GPRs. It follows the Java implementation algorithm but eagerly uses available registers. For example, FP+R18 are used when it's allowed. On simpler cores like RPi3 or Surface Pro it is 23-53% faster than C2 compiled version; on Graviton 3 it is 8-14% faster than C2 compiled version (which is faster than the current intrinsic); on Apple Silicon it is faster than C2 compiled version but slower than the ARMv8.2-SHA intrinsic. Improvements on a particular CPU depend on the input length. For instance, for Graviton 2: > > > Benchmark (ops/ms) (digesterName) (length) G2 > MessageDigests.digest SHA3-256 64 28.28% > MessageDigests.digest SHA3-256 16384 53.58% > MessageDigests.digest SHA3-512 64 27.97% > MessageDigests.digest SHA3-512 16384 43.90% > MessageDigests.getAndDigest SHA3-256 64 26.18% > MessageDigests.getAndDigest SHA3-256 16384 52.82% > MessageDigests.getAndDigest SHA3-512 64 24.73% > MessageDigests.getAndDigest SHA3-512 16384 44.31% > > > (results for intermediate input lengths look like steps) > > Existing intrinsic implementation is put under a flag `UseSIMDForSHA3Intrinsic` which is on by default where the intrinsic is enabled currently. > > Sanity tests were modified to cover new intrinsic variants (`-XX:-UseSIMDForSHA3Intrinsic -XX:+-PreserveFramePointer`) on aarch64 hw. Existing test cases where intrinsic is enabled are executed with `-XX:+IgnoreUnrecognizedVMOptions -XX:+UseSIMDForSHA3Intrinsic`, on platforms where the sha3 extension is missing they still are cut off by isSHA3IntrinsicAvailable() predicate. On Graviton 4 there is still a noticeable difference between the proposed implementation and C2 generated code: Benchmark (digesterName) (length) Score Pct MessageDigests.digest SHA3-256 64 8.3% MessageDigests.digest SHA3-256 16384 11% MessageDigests.digest SHA3-512 64 8.4% MessageDigests.digest SHA3-512 16384 11.5% MessageDigests.getAndDigest SHA3-256 64 7.2% MessageDigests.getAndDigest SHA3-256 16384 11% MessageDigests.getAndDigest SHA3-512 64 7.3% MessageDigests.getAndDigest SHA3-512 16384 11.6% and the version that uses the extension is ~1.8x slower than C2 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20422#issuecomment-2754472006 From duke at openjdk.org Wed Mar 26 14:19:17 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Wed, 26 Mar 2025 14:19:17 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: Message-ID: <3zkHWLVEELkQkeSU9M0YAOpb3olMDNyU1HAdWUJEm68=.a2d2f9ea-c635-4379-95d7-00ff358eb15f@github.com> On Tue, 25 Mar 2025 18:55:56 GMT, Stefan Karlsson wrote: > Are there any code that we know of that doesn't fit into a synchronization pattern similar to the above? > I can think of some contrived example where Thread B asks the OS for memory mappings and uses that to ascertain that a pre-determined address has been reserved, and how that could lead to an incorrect booking as you described, but do we really have code like that? >From what I can tell, it doesn't look like that's happening anywhere, someone else may know better though. Similarly, for uncommit, the base address must be passed over from somewhere else in the JVM so relying on some external synchonization seems reasonable here too. If this problem scenario is not present in the current code and it's not expected it to become a possiblity in the future, then I suppose there's no reason to guard against it. Maybe just a comment explaining the reasoning is good enough (and a warning not to use such patterns). ----------- > When does a release/uncommit fail? Would that be a JVM bug? On Windows, VirtualFree also looks like it only fails if an invalid set of arguments are passed. So if os::pd_release fails it's probably a JVM bug. Uncommit uses mmap, which could fail for a larger variety of reasons. Some reasons are out of control of the JVM. For example: "The number of mapped regions would exceed an implementation-defined limit (per process or per system)." See [here](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/memory/metaspace/virtualSpaceNode.cpp#L191) > What state is the memory in when such a failure happens? Do we even know if the memory is still committed if an uncommit fails? If release/uncommit fails, then it would be hard to know what state the target memory is in. If the arguments are invalid (bad base address), the target region may not even be allocated. Or, in the case of uncommit, if the base address is not aligned, maybe the target committed region does indeed exist but the uncommit still fails. So it would be hard to determine how to readjust the NMT accounting afterward. > I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today. I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled by most places in the caller, release failure seems mostly not. If we expect that release/uncommit could sometimes fail for valid reasons, then we cannot fail fatally in the os:: functions. Since, at least for uncommit, we could reasonably fail without it being a JVM bug, I think we shouldn't fatally crash when that happens. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2754589497 From dnsimon at openjdk.org Wed Mar 26 14:23:09 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 26 Mar 2025 14:23:09 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v2] In-Reply-To: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: > This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). > > By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. > > The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. > > I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. > > When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: > > java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: > > java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci > > at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) > at java.base/java.lang.Thread.run(Thread.java:1447) > Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: > > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Optim... Doug Simon has updated the pull request incrementally with one additional commit since the last revision: drop extra blank lines and preserve rule for first include in .inline.hpp files ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24247/files - new: https://git.openjdk.org/jdk/pull/24247/files/62779478..18e2a1d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24247&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24247&range=00-01 Stats: 52 lines in 4 files changed: 40 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24247/head:pull/24247 PR: https://git.openjdk.org/jdk/pull/24247 From duke at openjdk.org Wed Mar 26 14:34:47 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 26 Mar 2025 14:34:47 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v3] In-Reply-To: References: Message-ID: <5OOpDW693XhGVePrz6zYlm9gMZKneFaIfl6BJP-NQb0=.c5757e84-6b14-442b-b822-d0b1eeb5f913@github.com> > # Issue Summary > > When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. > > # Change Summary > > Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. > > Concretel, this PR > - adds parse predicate nodes to the IR testing framework, > - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, > - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, > - adds a regression test. > > > # Testing > > The changes passed the following testing: > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) > - tier1 through tier3 and Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: ir-framework: rename new nodes to convention ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24248/files - new: https://git.openjdk.org/jdk/pull/24248/files/08afa3d5..6f015a67 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=01-02 Stats: 10 lines in 2 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24248/head:pull/24248 PR: https://git.openjdk.org/jdk/pull/24248 From chagedorn at openjdk.org Wed Mar 26 15:08:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Mar 2025 15:08:15 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v3] In-Reply-To: <5OOpDW693XhGVePrz6zYlm9gMZKneFaIfl6BJP-NQb0=.c5757e84-6b14-442b-b822-d0b1eeb5f913@github.com> References: <5OOpDW693XhGVePrz6zYlm9gMZKneFaIfl6BJP-NQb0=.c5757e84-6b14-442b-b822-d0b1eeb5f913@github.com> Message-ID: On Wed, 26 Mar 2025 14:34:47 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > ir-framework: rename new nodes to convention A few comments but overall it looks good. Thanks for cleaning that up! src/hotspot/share/opto/c2_globals.hpp line 789: > 787: product(bool, UseProfiledLoopPredicate, true, \ > 788: "Move predicates out of loops based on profiling data. " \ > 789: "Requires UseLoopPredicate to be turned on (default).") \ It was already a bit vague before but I suggest to be more precise that we move checks with an uncommon trap out of a loop (and the resulting check before the loop is then a predicate): Move checks with an uncommon trap out of loops based on profiling data. Requires [...] src/hotspot/share/opto/loopnode.cpp line 4304: > 4302: tty->print(" profile_predicated"); > 4303: } > 4304: if (UseLoopPredicate && predicates.loop_predicate_block()->is_non_empty()) { Maybe you can merge these blocks: if (UseLoopPredicate) { if (UseProfiledLoopPredicate && predicates.profiled_loop_predicate_block()->is_non_empty()) { tty->print(" profile_predicated"); } if (predicates.loop_predicate_block()->is_non_empty()) { tty->print(" predicated"); } } test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1524: > 1522: public static final String PARSE_PREDICATE_LOOP = PREFIX + "PARSE_PREDICATE_LOOP" + POSTFIX; > 1523: static { > 1524: parsePredicateNodes(PARSE_PREDICATE_LOOP, "Loop"); I suggest the following names found in `predicates.hpp`: https://github.com/openjdk/jdk/blob/79bffe2f28f90986d45f4e91efc021290b4fc00a/src/hotspot/share/opto/predicates.hpp#L48-L50 test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2763: > 2761: IR_NODE_MAPPINGS.put(irNodePlaceholder, new SinglePhaseRangeEntry(CompilePhase.AFTER_PARSING, regex, > 2762: CompilePhase.AFTER_PARSING, > 2763: CompilePhase.CCP1)); I think the legal last phase should be `CompilePhase.PHASEIDEALLOOP_ITERATIONS` where we could observe `ParsePredicates`: Suggestion: CompilePhase.PHASEIDEALLOOP_ITERATIONS)); test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 41: > 39: static final int WARMUP = 10_000; > 40: static final int SIZE = 100; > 41: static final int min = 3; Since `min` is also a constant, you should capitalize it. test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 46: > 44: TestFramework.runWithFlags("-XX:+UseLoopPredicate", > 45: "-XX:+UseProfiledLoopPredicate"); > 46: TestFramework.runWithFlags("-XX:-UseLoopPredicate"); You could also add a run where you only disable `-XX:-UseProfiledLoopPredicate` for completness and add an IR rule accorndingly. test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 49: > 47: } > 48: > 49: @Run(test = { "test" }) Braces are not required here: Suggestion: @Run(test = "test") test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 72: > 70: > 71: @Test > 72: @IR(counts = { IRNode.PARSE_PREDICATE_LOOP, "=1", The `=` is not required: Suggestion: @IR(counts = { IRNode.PARSE_PREDICATE_LOOP, "1", test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 74: > 72: @IR(counts = { IRNode.PARSE_PREDICATE_LOOP, "=1", > 73: IRNode.PARSE_PREDICATE_PROFILED_LOOP, "1" }, > 74: phase = CompilePhase.AFTER_PARSING, `phase` is not required since you've decided that `AFTER_PARSING` is the default phase where we match this node on. You only need to specify `phase` if you want to match on a different phase. ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2717534252 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014385700 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014346920 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014355627 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014361300 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014364335 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014371293 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014363733 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014364939 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014366623 From mbaesken at openjdk.org Wed Mar 26 15:11:06 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 26 Mar 2025 15:11:06 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: > The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. > SOURCE=".:git:21af8c7e7405" > Also the MODULES list is probably useful to have. > Add this info (or the complete content of the release file) to the hs_err files. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: address Windows issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24244/files - new: https://git.openjdk.org/jdk/pull/24244/files/bea8d55c..bd95acf9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=00-01 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24244/head:pull/24244 PR: https://git.openjdk.org/jdk/pull/24244 From duke at openjdk.org Wed Mar 26 15:23:47 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 26 Mar 2025 15:23:47 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make Message-ID: This patch remove slice parameter from LoadNode::make Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 Hi team, I am new, I'd appreciate any guidance. Thank a lot! ------------- Commit messages: - 8344116: C2: remove slice parameter from LoadNode::make Changes: https://git.openjdk.org/jdk/pull/24258/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344116 Stats: 54 lines in 13 files changed: 3 ins; 14 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From duke at openjdk.org Wed Mar 26 15:27:39 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 26 Mar 2025 15:27:39 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v4] In-Reply-To: References: Message-ID: > # Issue Summary > > When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. > > # Change Summary > > Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. > > Concretel, this PR > - adds parse predicate nodes to the IR testing framework, > - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, > - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, > - adds a regression test. > > > # Testing > > The changes passed the following testing: > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) > - tier1 through tier3 and Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from @chhagedorn Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24248/files - new: https://git.openjdk.org/jdk/pull/24248/files/6f015a67..ea653995 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24248/head:pull/24248 PR: https://git.openjdk.org/jdk/pull/24248 From shade at openjdk.org Wed Mar 26 15:42:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 15:42:28 GMT Subject: RFR: 8351151: Clean up x86 template interpreter after 32-bit x86 removal Message-ID: x86 template interpreter carries `_LP64`-predicated code blocks that were supporting 32-bit x86. With that port gone, we can clean up the x86 template interpreter. I have checked no superfluous `LP64`, `AMD64`, `IA32` defines are left in affected files. Where obvious, I inlined `r15_thread` and `c_arg*`. Left the uses that assign these args to symbolic locals that have meaningful names. `verify_FPU` is also no-op now, removed that. There are related cleanups in compilers and runtime we need to do first, before we fully remove `VerifyFPU` flag. This change fans out a little to other platform template interpreters to remove `verify_FPU` as well. Additional testing: - [x] Linux x86_64 server fastdebug, `tier1` - [ ] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Revert one incorrect replacement - Touchups - Fix Changes: https://git.openjdk.org/jdk/pull/24251/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24251&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351151 Stats: 1165 lines in 15 files changed: 2 ins; 1040 del; 123 mod Patch: https://git.openjdk.org/jdk/pull/24251.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24251/head:pull/24251 PR: https://git.openjdk.org/jdk/pull/24251 From shade at openjdk.org Wed Mar 26 15:42:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 15:42:28 GMT Subject: RFR: 8351151: Clean up x86 template interpreter after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 11:51:22 GMT, Aleksey Shipilev wrote: > x86 template interpreter carries `_LP64`-predicated code blocks that were supporting 32-bit x86. With that port gone, we can clean up the x86 template interpreter. I have checked no superfluous `LP64`, `AMD64`, `IA32` defines are left in affected files. > > Where obvious, I inlined `r15_thread` and `c_arg*`. Left the uses that assign these args to symbolic locals that have meaningful names. > > `verify_FPU` is also no-op now, removed that. There are related cleanups in compilers and runtime we need to do first, before we fully remove `VerifyFPU` flag. This change fans out a little to other platform template interpreters to remove `verify_FPU` as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [ ] Linux x86_64 server fastdebug, `all` Hey, @coleenp, you might enjoy seeing this :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24251#issuecomment-2754867525 From duke at openjdk.org Wed Mar 26 15:43:30 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 26 Mar 2025 15:43:30 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v2] In-Reply-To: References: Message-ID: <6NXNfV1dqzZxpogva4dsv0kxkAQtJlgmLnSHvgZm5YA=.461d9a09-1e23-4acd-8230-0840348183ef@github.com> > This patch remove slice parameter from LoadNode::make > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into 8344116 - 8344116: C2: remove slice parameter from LoadNode::make ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/27df4a01..f4ef46dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=00-01 Stats: 34071 lines in 1200 files changed: 1990 ins; 30272 del; 1809 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From stefank at openjdk.org Wed Mar 26 15:46:18 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Mar 2025 15:46:18 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v2] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Wed, 26 Mar 2025 14:23:09 GMT, Doug Simon wrote: >> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> >> By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. >> >> The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. >> >> I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. >> >> When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: >> >> java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: >> >> java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci >> >> at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) >> Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: >> >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp >> /Users/dnsimo... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > drop extra blank lines and preserve rule for first include in .inline.hpp files Thanks for doing the last two fixes. I think this looks good now, but I need a bit more time to do some deeper verification. Thanks! ------------- PR Review: https://git.openjdk.org/jdk/pull/24247#pullrequestreview-2717748962 From shade at openjdk.org Wed Mar 26 15:58:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 15:58:56 GMT Subject: RFR: 8352980: Purge infrastructure for FP-to-bits interpreter intrinsics after 32-bit x86 removal Message-ID: [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373) added the infrastructure to implement `_intBitsToFloat`, `_floatToRawIntBits`, `_longBitsToDouble`, `_doubleToRawLongBits` intrinsics in template interpreters. That work was needed to support NaNs for 32-bit x86, and is no longer needed. I did not do deep testing, because the removed code is actually dead. Additional testing: - [ ] GHA checks for platform builds - [ ] Linux x86_64 server fastdebug, `tier1` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/24259/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24259&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352980 Stats: 95 lines in 11 files changed: 0 ins; 95 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24259.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24259/head:pull/24259 PR: https://git.openjdk.org/jdk/pull/24259 From pminborg at openjdk.org Wed Mar 26 16:01:14 2025 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 26 Mar 2025 16:01:14 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v9] In-Reply-To: References: Message-ID: <9iLjCiu5ELnY-gfnuMZePBiAoMZvwPLGpRqTn_np554=.b3199704-402c-4f8e-9989-b3f2f4c08180@github.com> > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Update src/java.base/share/classes/java/lang/StableValue.java Co-authored-by: Paul Sandoz ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/42d4dcfa..69688848 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From vklang at openjdk.org Wed Mar 26 16:01:15 2025 From: vklang at openjdk.org (Viktor Klang) Date: Wed, 26 Mar 2025 16:01:15 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: <4OFH9hamFX7_rN0TNR5jWDSVrMp4qNJXfEhKy2m0Rac=.e43eb679-965f-4176-8b62-403a61e088e6@github.com> On Mon, 17 Mar 2025 00:40:46 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/util/ImmutableCollections.java line 798: >> >>> 796: throw new IndexOutOfBoundsException(i); >>> 797: } >>> 798: } >> >> I think `orElseSet` should be outside of the `try` block, otherwise an `ArrayIndexOutOfBoundsException` thrown by `mapper.apply` will be wrapped. > > Even better, we should just do a `Preconditions.checkIndex` explicitly. I think the idea here is to avoid having to perform two consecutive range checks?one ahead of access and one as a part of the access. (wrapping of the exception aside). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2014504355 From dchuyko at openjdk.org Wed Mar 26 16:02:32 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Wed, 26 Mar 2025 16:02:32 GMT Subject: RFR: 8337666: AArch64: SHA3 GPR intrinsic Message-ID: This is an implementation of SHA3 intrinsics for AArch64 that operates GPRs. It follows the Java implementation algorithm but eagerly uses available registers. For example, FP+R18 are used when it's allowed. On simpler cores like RPi3 or Surface Pro it is 23-53% faster than C2 compiled version; on Graviton 3 it is 8-14% faster than C2 compiled version (which is faster than the current intrinsic); on Apple Silicon it is faster than C2 compiled version but slower than the ARMv8.2-SHA intrinsic. Improvements on a particular CPU depend on the input length. For instance, for Graviton 2: Benchmark (ops/ms) (digesterName) (length) G2 MessageDigests.digest SHA3-256 64 28.28% MessageDigests.digest SHA3-256 16384 53.58% MessageDigests.digest SHA3-512 64 27.97% MessageDigests.digest SHA3-512 16384 43.90% MessageDigests.getAndDigest SHA3-256 64 26.18% MessageDigests.getAndDigest SHA3-256 16384 52.82% MessageDigests.getAndDigest SHA3-512 64 24.73% MessageDigests.getAndDigest SHA3-512 16384 44.31% (results for intermediate input lengths look like steps) On Graviton 4 there is still a noticeable difference between the proposed implementation and C2 generated code: Benchmark (digesterName) (length) Pct MessageDigests.digest SHA3-256 64 8.3% MessageDigests.digest SHA3-256 16384 11% MessageDigests.digest SHA3-512 64 8.4% MessageDigests.digest SHA3-512 16384 11.5% MessageDigests.getAndDigest SHA3-256 64 7.2% MessageDigests.getAndDigest SHA3-256 16384 11% MessageDigests.getAndDigest SHA3-512 64 7.3% MessageDigests.getAndDigest SHA3-512 16384 11.6% and the version that uses the extension is ~1.8x slower than C2 Existing intrinsic implementation is put under a flag `UseSIMDForSHA3Intrinsic` which is on by default where the intrinsic is enabled currently. Sanity tests were modified to cover new intrinsic variants (`-XX:-UseSIMDForSHA3Intrinsic -XX:+-PreserveFramePointer`) on aarch64 hw. Existing test cases where intrinsic is enabled are executed with `-XX:+IgnoreUnrecognizedVMOptions -XX:+UseSIMDForSHA3Intrinsic`, on platforms where the sha3 extension is missing they still are cut off by isSHA3IntrinsicAvailable() predicate. The original PR https://github.com/openjdk/jdk/pull/20422 has been auto-closed and the branch has been re-created on top of the new master. ------------- Commit messages: - Delete empty line - SHA3 GPR intrinsic & tests Changes: https://git.openjdk.org/jdk/pull/24260/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24260&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337666 Stats: 757 lines in 5 files changed: 752 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24260.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24260/head:pull/24260 PR: https://git.openjdk.org/jdk/pull/24260 From stefank at openjdk.org Wed Mar 26 16:07:08 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Mar 2025 16:07:08 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: <3zkHWLVEELkQkeSU9M0YAOpb3olMDNyU1HAdWUJEm68=.a2d2f9ea-c635-4379-95d7-00ff358eb15f@github.com> References: <3zkHWLVEELkQkeSU9M0YAOpb3olMDNyU1HAdWUJEm68=.a2d2f9ea-c635-4379-95d7-00ff358eb15f@github.com> Message-ID: On Wed, 26 Mar 2025 14:16:06 GMT, Robert Toyonaga wrote: > > Are there any code that we know of that doesn't fit into a synchronization pattern similar to the above? > > I can think of some contrived example where Thread B asks the OS for memory mappings and uses that to ascertain that a pre-determined address has been reserved, and how that could lead to an incorrect booking as you described, but do we really have code like that? > > From what I can tell, it doesn't look like that's happening anywhere, someone else may know better though. Similarly, for uncommit, the base address must be passed over from somewhere else in the JVM so relying on some external synchonization seems reasonable here too. If this problem scenario is not present in the current code and it's not expected it to become a possiblity in the future, then I suppose there's no reason to guard against it. Maybe just a comment explaining the reasoning is good enough (and a warning not to use such patterns). > > > When does a release/uncommit fail? Would that be a JVM bug? > > On Windows, VirtualFree also looks like it only fails if an invalid set of arguments are passed. So if os::pd_release fails it's probably a JVM bug. Uncommit uses mmap, which could fail for a larger variety of reasons. Some reasons are out of control of the JVM. For example: "The number of mapped regions would exceed an implementation-defined limit (per process or per system)." See [here](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/memory/metaspace/virtualSpaceNode.cpp#L191) Right. And that failure is fatal, so there should be no need to fix any NMT bookkeeping for that. > > > What state is the memory in when such a failure happens? Do we even know if the memory is still committed if an uncommit fails? > > If release/uncommit fails, then it would be hard to know what state the target memory is in. If the arguments are invalid (bad base address), the target region may not even be allocated. Or, in the case of uncommit, if the base address is not aligned, maybe the target committed region does indeed exist but the uncommit still fails. So it would be hard to determine how to readjust the NMT accounting afterward. Agreed. And this would be a pre-existing problem already. If a release/uncommit fails, then we have the similar issues for that as well. > > > I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today. > > I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions. I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2754953604 From jiangli at openjdk.org Wed Mar 26 16:19:24 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 26 Mar 2025 16:19:24 GMT Subject: RFR: 8352766: Problemlist hotspot tier1 tests requiring tools that are not included in static JDK In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 03:02:32 GMT, Jiangli Zhou wrote: > Please review this change that adds test/hotspot/jtreg/ProblemList-StaticJdk.txt, which problemlists 27 hotspot tier1 tests that use `javac`, `jstack`, `jcmd` and `jhsdb` at runtime. > > Following is an example of the command that I use to run hotspot tier1 tests on static JDK with the extra `ProblemList-StaticJdk.txt`: > > > $ make test TEST="test/hotspot/jtreg:tier1" JDK_UNDER_TEST=//JDK-8352766/build/linux-x86_64-server-fastdebug/images/static-jdk JDK_FOR_COMPILE=//JDK-8352766/build/linux-x86_64-server-fastdebug/images/jdk JTREG="EXTRA_PROBLEM_LISTS=//JDK-8352766/test/hotspot/jtreg/ProblemList-StaticJdk.txt" Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24214#issuecomment-2754987185 From jiangli at openjdk.org Wed Mar 26 16:19:24 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 26 Mar 2025 16:19:24 GMT Subject: Integrated: 8352766: Problemlist hotspot tier1 tests requiring tools that are not included in static JDK In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 03:02:32 GMT, Jiangli Zhou wrote: > Please review this change that adds test/hotspot/jtreg/ProblemList-StaticJdk.txt, which problemlists 27 hotspot tier1 tests that use `javac`, `jstack`, `jcmd` and `jhsdb` at runtime. > > Following is an example of the command that I use to run hotspot tier1 tests on static JDK with the extra `ProblemList-StaticJdk.txt`: > > > $ make test TEST="test/hotspot/jtreg:tier1" JDK_UNDER_TEST=//JDK-8352766/build/linux-x86_64-server-fastdebug/images/static-jdk JDK_FOR_COMPILE=//JDK-8352766/build/linux-x86_64-server-fastdebug/images/jdk JTREG="EXTRA_PROBLEM_LISTS=//JDK-8352766/test/hotspot/jtreg/ProblemList-StaticJdk.txt" This pull request has now been integrated. Changeset: 53926742 Author: Jiangli Zhou URL: https://git.openjdk.org/jdk/commit/53926742c02480def6a42683fcaf284b99bcb0a1 Stats: 34 lines in 1 file changed: 34 ins; 0 del; 0 mod 8352766: Problemlist hotspot tier1 tests requiring tools that are not included in static JDK Reviewed-by: dholmes, ihse ------------- PR: https://git.openjdk.org/jdk/pull/24214 From stuefe at openjdk.org Wed Mar 26 16:23:07 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Mar 2025 16:23:07 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 18:58:59 GMT, Stefan Karlsson wrote: > > Hi stefank, I think you're right about (1.1) (2.1) (2.2) (1.2) being prevented by the current implementation. Is there a reason that the current implementation only does the wider locking for release/uncommit? Maybe (2.1) (1.1) (1.2) (2.2) isn't much of an issue since it's unlikely that another thread would uncommit/release the same base address shortly after it's committed/reserved? > > I'm very curious to find out if anyone knows how this could happen without a race condition hand-over from one thread to another. (See my answer to St?fe). Stefan, your analysis sounds reasonable. Don't see a hole. The original issue was from me I think, but I've never seen that variant in real life. So, I am fine with leaving that scenario out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2754989908 From stuefe at openjdk.org Wed Mar 26 16:23:08 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Mar 2025 16:23:08 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: <3zkHWLVEELkQkeSU9M0YAOpb3olMDNyU1HAdWUJEm68=.a2d2f9ea-c635-4379-95d7-00ff358eb15f@github.com> Message-ID: <1S9u0e21AfElb6hNR-rLXa7JIhyUBE35ePRt1d4vhfs=.639232d7-aa07-4ac2-96b2-a2a5414c0377@github.com> On Wed, 26 Mar 2025 16:05:00 GMT, Stefan Karlsson wrote: > > > > I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions. > > I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error? I second @stefank here. Uncommit can fail, ironically, with an ENOMEM : if the uncommit punches a hole into a committed region, this would cause a new new VMA on the kernel-side. This may fail if we run against the limit for VMAs. Forgot what it was, some sysconf setting. All of this is Linux specific, though. I don't think this should be unconditionally a fatal error. Since the allocator (whatever it is) can decide to re-commit the region later, and thus "self-heal" itself. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2754997791 From stefank at openjdk.org Wed Mar 26 16:46:08 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Mar 2025 16:46:08 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: <1S9u0e21AfElb6hNR-rLXa7JIhyUBE35ePRt1d4vhfs=.639232d7-aa07-4ac2-96b2-a2a5414c0377@github.com> References: <3zkHWLVEELkQkeSU9M0YAOpb3olMDNyU1HAdWUJEm68=.a2d2f9ea-c635-4379-95d7-00ff358eb15f@github.com> <1S9u0e21AfElb6hNR-rLXa7JIhyUBE35ePRt1d4vhfs=.639232d7-aa07-4ac2-96b2-a2a5414c0377@github.com> Message-ID: On Wed, 26 Mar 2025 16:19:41 GMT, Thomas Stuefe wrote: > > > I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions. > > > > > > I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error? > > I second @stefank here. > > Uncommit can fail, ironically, with an ENOMEM : if the uncommit punches a hole into a committed region, this would cause a new new VMA on the kernel-side. This may fail if we run against the limit for VMAs. Forgot what it was, some sysconf setting. All of this is Linux specific, though. This happens when we hit the /proc/sys/vm/max_map_count limit, and this immediately crashes the JVM. > > I don't think this should be unconditionally a fatal error. Since the allocator (whatever it is) can decide to re-commit the region later, and thus "self-heal" itself. Is this referring to failures when we hit the max_map_count limit? I'm not convinced that you can recover from that without immediately hitting the same issue somewhere else in the code. Or maybe you are thinking about some of the other reasons for the uncommit to fail? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2755066544 From kbarrett at openjdk.org Wed Mar 26 18:10:17 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 26 Mar 2025 18:10:17 GMT Subject: RFR: 8352140: UBSAN: fix the left shift of negative value in klass.hpp, array_layout_helper() In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:33:46 GMT, Afshin Zafari wrote: >> signed short int x = -32768; >> signed short int y = x << 1; >> >> >> That does seem like an interestingly weird case. Unless I'm missing something, >> there's no UB-overflow in that. The shift expression promotes `short x` to >> `int x`, sign extending it. The `int`-typed shift is fine (since C++20, and >> effectively so prior to that in non-constexpr-required contexts - see below). >> And the implicit conversion to `short y` is implementation-defined (before >> C++20, though gcc may warn (-Woverflow)) or fine (since C++20). >> >> gcc warns about x being negative in C++11 to C++17 modes >> (-Wshift-negative-value enabled by default), but doesn't treat it as UB. >> Before C++20 gcc errors (warns if -fpermissive) if it's in a >> required-constexpr-context, even if -Wshift-negative-value is disabled. >> That all seems consistent. > > I had to emphasize that the case shown in the example may happen at run-time where compiler has no chance to warn/avoid/address it. > My concern is that developers should not rely on the compiler to check the validation of left-shift op. They should be aware of the `signed` <-> `unsigned` and `int` <-> `long` <-> `long long` conversions during the left-shift. > To find invalid cases of left-shift, UBSAN instruments them with assertions to catch them at run-time. If the assertion raised, good we found the problem. However, if no assertion raised for some left-shift ops, it doesn't mean that they are valid. Is there a way to tell ubsan that we care about detecting overflows, but we do not care about detecting left shift of a negative value? Not that I can find, but maybe I missed something. `-fsanitize=shift-base` looks like it would check for both overflow and (prior to C++20) negative base. We could disable shift-base checking and do our own overflow assertion. (Which might want to be packaged up in a helper, as discussed in https://github.com/openjdk/jdk/pull/24196.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24184#discussion_r2014756924 From shade at openjdk.org Wed Mar 26 18:38:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 18:38:33 GMT Subject: RFR: 8351157: Clean up x86 GC barriers after 32-bit x86 removal Message-ID: Assembler GC barriers have quite a bit of coding to support 32-bit x86. As 32-bit x86 is removed, we can clean up those parts. We can eliminate `!LP64` blocks quite easily. We can also prune passing around `thread` argument, and just trust that `r15_thread` is always available. Additional testing: - [x] Linux x86_64 server fastdebug, `tier1` - [ ] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Also do tlab_allocate - Rely on R15 to be a thread register - Work Changes: https://git.openjdk.org/jdk/pull/24253/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24253&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351157 Stats: 546 lines in 20 files changed: 1 ins; 429 del; 116 mod Patch: https://git.openjdk.org/jdk/pull/24253.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24253/head:pull/24253 PR: https://git.openjdk.org/jdk/pull/24253 From duke at openjdk.org Wed Mar 26 19:00:17 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Wed, 26 Mar 2025 19:00:17 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: <3zkHWLVEELkQkeSU9M0YAOpb3olMDNyU1HAdWUJEm68=.a2d2f9ea-c635-4379-95d7-00ff358eb15f@github.com> Message-ID: On Wed, 26 Mar 2025 16:05:00 GMT, Stefan Karlsson wrote: >>> What state is the memory in when such a failure happens? Do we even know if the memory is still committed if an uncommit fails? > > >> If release/uncommit fails, then it would be hard to know what state the target memory is in. If the arguments are invalid (bad base address), the target region may not even be allocated. Or, in the case of uncommit, if the base address is not aligned, maybe the target committed region does indeed exist but the uncommit still fails. So it would be hard to determine how to readjust the NMT accounting afterward. > > Agreed. And this would be a pre-existing problem already. If a release/uncommit fails, then we have the similar issues for that as well. Hi @stefank, Are you referring to the difficulty in determining the original allocation as being the pre-existing problem? I think that only becomes an issue if we decide to swap the order of NMT booking and the memory release/uncommit (assuming we don't just fail fatally). Since we don't need to readjust currently, if there's a failure we can just leave everything as it is. >>> I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today. > > >> I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions. > > I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error? [`VirtualSpace::shrink_by`](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/memory/virtualspace.cpp#L373) allows uncommit to fail without crashing. I'm not certain of the intention behind that. But it seems like it's because shrinking is an optimization and not always critical that it be done immediately. [[1](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/gc/serial/tenuredGeneration.cpp#L258)] ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2755468073 From kvn at openjdk.org Wed Mar 26 19:02:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 26 Mar 2025 19:02:13 GMT Subject: RFR: 8351155: C1/C2: Remove 32-bit x86 specific FP rounding support In-Reply-To: <8zUrV-sMSOwRSQk_jERtFqjrzOFUP7rlUwTTN7cPP_8=.b1d30fb5-d9c0-4417-bacd-bf09f2af433b@github.com> References: <8zUrV-sMSOwRSQk_jERtFqjrzOFUP7rlUwTTN7cPP_8=.b1d30fb5-d9c0-4417-bacd-bf09f2af433b@github.com> Message-ID: On Wed, 26 Mar 2025 10:11:25 GMT, Aleksey Shipilev wrote: > C1 and C2 have support for rounding double/floats, to support awkward rounding modes of x87 FPU. With 32-bit x86 port removed, we can remove those parts. This basically deletes all the code that uses `strict_fp_requires_explicit_rounding`, which is now universally `false` for all supported platforms. > > For C1, we remove `RoundFP` op, its associated `lir_roundfp` and related utility methods that insert these nodes in the graph. > > For C2, we remove `RoundDouble` and `RoundFloat` nodes (note there is a confusingly named `RoundDoubleMode` nodes that are not related to this), associated utility methods, AD match rules that reference these nodes (as nops!), and some `Ideal`-s that are no longer needed. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24250#pullrequestreview-2718342841 From kvn at openjdk.org Wed Mar 26 19:04:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 26 Mar 2025 19:04:13 GMT Subject: RFR: 8352980: Purge infrastructure for FP-to-bits interpreter intrinsics after 32-bit x86 removal In-Reply-To: References: Message-ID: <4XCMpK6Wggs4KJlAUa__OGXzvH56uupTRTaKuubreWw=.bbf759a0-46c5-4478-be01-543b6eb91840@github.com> On Wed, 26 Mar 2025 15:53:45 GMT, Aleksey Shipilev wrote: > [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373) added the infrastructure to implement `_intBitsToFloat`, `_floatToRawIntBits`, `_longBitsToDouble`, `_doubleToRawLongBits` intrinsics in template interpreters. That work was needed to support NaNs for 32-bit x86, and is no longer needed. The C1/C2 compiler intrinsics for these are still implemented and functional. > > I did not do deep testing, because the removed code is actually dead. > > Additional testing: > - [x] GHA checks for platform builds > - [x] Linux x86_64 server fastdebug, `tier1` Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24259#pullrequestreview-2718346078 From vlivanov at openjdk.org Wed Mar 26 19:13:12 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 26 Mar 2025 19:13:12 GMT Subject: RFR: 8351155: C1/C2: Remove 32-bit x86 specific FP rounding support In-Reply-To: <8zUrV-sMSOwRSQk_jERtFqjrzOFUP7rlUwTTN7cPP_8=.b1d30fb5-d9c0-4417-bacd-bf09f2af433b@github.com> References: <8zUrV-sMSOwRSQk_jERtFqjrzOFUP7rlUwTTN7cPP_8=.b1d30fb5-d9c0-4417-bacd-bf09f2af433b@github.com> Message-ID: On Wed, 26 Mar 2025 10:11:25 GMT, Aleksey Shipilev wrote: > C1 and C2 have support for rounding double/floats, to support awkward rounding modes of x87 FPU. With 32-bit x86 port removed, we can remove those parts. This basically deletes all the code that uses `strict_fp_requires_explicit_rounding`, which is now universally `false` for all supported platforms. > > For C1, we remove `RoundFP` op, its associated `lir_roundfp` and related utility methods that insert these nodes in the graph. > > For C2, we remove `RoundDouble` and `RoundFloat` nodes (note there is a confusingly named `RoundDoubleMode` nodes that are not related to this), associated utility methods, AD match rules that reference these nodes (as nops!), and some `Ideal`-s that are no longer needed. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24250#pullrequestreview-2718363625 From vlivanov at openjdk.org Wed Mar 26 19:14:15 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 26 Mar 2025 19:14:15 GMT Subject: RFR: 8352980: Purge infrastructure for FP-to-bits interpreter intrinsics after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 15:53:45 GMT, Aleksey Shipilev wrote: > [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373) added the infrastructure to implement `_intBitsToFloat`, `_floatToRawIntBits`, `_longBitsToDouble`, `_doubleToRawLongBits` intrinsics in template interpreters. That work was needed to support NaNs for 32-bit x86, and is no longer needed. The C1/C2 compiler intrinsics for these are still implemented and functional. > > I did not do deep testing, because the removed code is actually dead. > > Additional testing: > - [x] GHA checks for platform builds > - [x] Linux x86_64 server fastdebug, `tier1` Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24259#pullrequestreview-2718366489 From kbarrett at openjdk.org Wed Mar 26 19:22:15 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 26 Mar 2025 19:22:15 GMT Subject: RFR: 8351157: Clean up x86 GC barriers after 32-bit x86 removal In-Reply-To: References: Message-ID: <1rIX0wehaIIaJsnvIoAGshNeioyVi-E6JiPW3lleQ00=.22f8c01f-bcde-4ad8-8ca1-518b727796af@github.com> On Wed, 26 Mar 2025 12:48:13 GMT, Aleksey Shipilev wrote: > Assembler GC barriers have quite a bit of coding to support 32-bit x86. As 32-bit x86 is removed, we can clean up those parts. > > We can eliminate `!LP64` blocks quite easily. We can also prune passing around `thread` argument, and just trust that `r15_thread` is always available. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [ ] Linux x86_64 server fastdebug, `all` Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24253#pullrequestreview-2718385753 From coleenp at openjdk.org Wed Mar 26 19:45:22 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 26 Mar 2025 19:45:22 GMT Subject: RFR: 8351151: Clean up x86 template interpreter after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 11:51:22 GMT, Aleksey Shipilev wrote: > x86 template interpreter carries `_LP64`-predicated code blocks that were supporting 32-bit x86. With that port gone, we can clean up the x86 template interpreter. I have checked no superfluous `LP64`, `AMD64`, `IA32` defines are left in affected files. > > Where obvious, I inlined `r15_thread` and `c_arg*`. Left the uses that assign these args to symbolic locals that have meaningful names. > > `verify_FPU` is also no-op now, removed that. There are related cleanups in compilers and runtime we need to do first, before we fully remove `VerifyFPU` flag. This change fans out a little to other platform template interpreters to remove `verify_FPU` as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Looks good. Lots of changes but it was worth combining 32 and 64 bits 10 years ago. Date: Fri Mar 13 15:16:07 2015 -0400 8074717: Merge interp_masm files for x86 _32 and _64 Thank you for doing this! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24251#pullrequestreview-2718451921 From sspitsyn at openjdk.org Wed Mar 26 19:48:20 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 26 Mar 2025 19:48:20 GMT Subject: Integrated: 8352812: remove useless class and function parameter in SuspendThread impl In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 08:53:48 GMT, Serguei Spitsyn wrote: > The internal class JvmtiSuspendControl is transitively used in the SuspendThread implementation is not really needed and is being removed. Also, the suspend_thread function has unused need_safepoint_p parameter which is being removed as well. > > Testing: > - TBD: Run mach5 tiers 1-3 to be safe This pull request has now been integrated. Changeset: 441bd126 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/441bd1265650dc865897d5cb6a673edb89dd5cee Stats: 68 lines in 5 files changed: 0 ins; 58 del; 10 mod 8352812: remove useless class and function parameter in SuspendThread impl Reviewed-by: lmesnik, cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/24219 From shade at openjdk.org Wed Mar 26 19:59:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 19:59:25 GMT Subject: RFR: 8352415: x86: Tighten up template interpreter method entry code In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 13:44:40 GMT, Aleksey Shipilev wrote: > Interpreter performance is the still important for faster startup, since it would carry application until compilers kick in. After looking at Leyden scenarios in Xint mode, I believe incremental improvements are possible in template interpreter to make it faster. > > One of those improvements is tightening up method entry code. Profiling shows the hottest path in the whole ordeal for non-native methods is resolving the Java mirror to store the GC root for currently executing Method*. It involves 4-5 chained memory accesses, which incurs significant latency. > > We can massage the code to reuse some memory accesses and also spread them out to allow more latency-hiding hardware mechanisms to kick in. > > Additional testing: > - [x] Ad-hoc `-Xint` benchmarks > - [x] Linux x86_64 server fastdebug, `all` If there are no other comments, I am going to integrate this soon. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24114#issuecomment-2755610971 From stefank at openjdk.org Wed Mar 26 20:21:17 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Mar 2025 20:21:17 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: <3zkHWLVEELkQkeSU9M0YAOpb3olMDNyU1HAdWUJEm68=.a2d2f9ea-c635-4379-95d7-00ff358eb15f@github.com> Message-ID: <5wBQqxybptneJjhR5usfrqg3PJ7G2PB_sDjUkb4BObM=.fe04a403-64ad-4dc5-b793-b48da01acfd4@github.com> On Wed, 26 Mar 2025 18:57:51 GMT, Robert Toyonaga wrote: > > > > What state is the memory in when such a failure happens? Do we even know if the memory is still committed if an uncommit fails? > > > > > > > > > If release/uncommit fails, then it would be hard to know what state the target memory is in. If the arguments are invalid (bad base address), the target region may not even be allocated. Or, in the case of uncommit, if the base address is not aligned, maybe the target committed region does indeed exist but the uncommit still fails. So it would be hard to determine how to readjust the NMT accounting afterward. > > > > > > Agreed. And this would be a pre-existing problem already. If a release/uncommit fails, then we have the similar issues for that as well. > > Hi @stefank, Are you referring to the difficulty in determining the original allocation as being the pre-existing problem? I think that only becomes an issue if we decide to swap the order of NMT booking and the memory release/uncommit (assuming we don't just fail fatally). Since we don't need to readjust currently, if there's a failure we can just leave everything as it is. My thinking is that if there is a failure you don't know what state the OS left the memory in. So, you don't know whether the memory was in fact unmapped as requested, or if it was left intact, or even something in-between. So, if you don't do the matching NMT bookkeeping there will be a mismatch between the state of the memory and what has been bookkeeped in NMT. > > > > > I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today. > > > > > > > > > I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions. > > > > > > I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error? > > [`VirtualSpace::shrink_by`](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/memory/virtualspace.cpp#L373) allows uncommit to fail without crashing. I'm not certain of the intention behind that. But it seems like it's because shrinking is an optimization and not always critical that it be done immediately. [[1](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/gc/serial/tenuredGeneration.cpp#L258)] The above example shows code that assumes that it is OK to fail uncommitting and continuing. I'm trying to figure it that assumption is true. So, what I meant was that I was looking for a concrete example of a failure mode of uncommit that would be an acceptable (safe) failure to continue executing from. That is, a valid failure that don't mess up the memory in an unpredictable/unknowable way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2755656985 From fparain at openjdk.org Wed Mar 26 20:39:06 2025 From: fparain at openjdk.org (Frederic Parain) Date: Wed, 26 Mar 2025 20:39:06 GMT Subject: RFR: 8351151: Clean up x86 template interpreter after 32-bit x86 removal In-Reply-To: References: Message-ID: <6TqRaA5rzHSgZVM1EXXPDFIOFLPXWW1LCJ5J5FYaWMw=.68b39443-029d-4523-9344-e35bfb8f4dc5@github.com> On Wed, 26 Mar 2025 11:51:22 GMT, Aleksey Shipilev wrote: > x86 template interpreter carries `_LP64`-predicated code blocks that were supporting 32-bit x86. With that port gone, we can clean up the x86 template interpreter. I have checked no superfluous `LP64`, `AMD64`, `IA32` defines are left in affected files. > > Where obvious, I inlined `r15_thread` and `c_arg*`. Left the uses that assign these args to symbolic locals that have meaningful names. > > `verify_FPU` is also no-op now, removed that. There are related cleanups in compilers and runtime we need to do first, before we fully remove `VerifyFPU` flag. This change fans out a little to other platform template interpreters to remove `verify_FPU` as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Looks good to me. Thank you for this cleanup. Fred ------------- Marked as reviewed by fparain (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24251#pullrequestreview-2718564125 From vlivanov at openjdk.org Wed Mar 26 21:55:12 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 26 Mar 2025 21:55:12 GMT Subject: RFR: 8351151: Clean up x86 template interpreter after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 11:51:22 GMT, Aleksey Shipilev wrote: > x86 template interpreter carries `_LP64`-predicated code blocks that were supporting 32-bit x86. With that port gone, we can clean up the x86 template interpreter. I have checked no superfluous `LP64`, `AMD64`, `IA32` defines are left in affected files. > > Where obvious, I inlined `r15_thread` and `c_arg*`. Left the uses that assign these args to symbolic locals that have meaningful names. > > `verify_FPU` is also no-op now, removed that. There are related cleanups in compilers and runtime we need to do first, before we fully remove `VerifyFPU` flag. This change fans out a little to other platform template interpreters to remove `verify_FPU` as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Overall, looks good! One suggestion: maybe unconditionally use `r15_thread` everywhere? Sometimes you still keep a local variable (e.g., `const Register thread = r15_thread;`). ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24251#pullrequestreview-2718696721 From iklam at openjdk.org Wed Mar 26 22:11:49 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 26 Mar 2025 22:11:49 GMT Subject: RFR: 8352579: Refactor CDS legacy optimization for lambda proxy classes [v4] In-Reply-To: References: Message-ID: <-2d_BMWsSP1zntZbYdObf5VymwTCCZYM2g9BKrJUpxk=.1d06eaea-187e-476f-bdcf-04568eb4ef2f@github.com> On Tue, 25 Mar 2025 22:13:55 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @matias9927 offline comments - consolidated two functions with identical names > > src/hotspot/share/cds/lambdaProxyClassDictionary.cpp line 31: > >> 29: #include "classfile/systemDictionaryShared.hpp" >> 30: #include "interpreter/bootstrapInfo.hpp" >> 31: #include "jfr/jfrEvents.hpp" > > Extra include at line 31? jfrEvents.hpp is needed by `EventClassLoad` on line 360 > src/hotspot/share/classfile/systemDictionaryShared.cpp line 34: > >> 32: #include "cds/classListWriter.hpp" >> 33: #include "cds/dumpTimeClassInfo.inline.hpp" >> 34: #include "cds/dynamicArchive.hpp" > > Pre-existing: I think the include of `cds/archiveHeapLoader.hpp` at line #27 is unnecessary. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24145#discussion_r2015063220 PR Review Comment: https://git.openjdk.org/jdk/pull/24145#discussion_r2015063251 From iklam at openjdk.org Wed Mar 26 22:11:48 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 26 Mar 2025 22:11:48 GMT Subject: RFR: 8352579: Refactor CDS legacy optimization for lambda proxy classes [v5] In-Reply-To: References: Message-ID: > Since JDK 16, CDS has provided limited optimization for lambda expressions. This has been superseded by JEP 483 and is useful only when `-XX:+AOTClassLinking` is not enabled (which is the case for the default CDS archive, for compatibility reasons). > > The "legacy lambda optimization" may eventually be removed. For the time being, we should consolidate the code into a single source code and clearly mark its uses. This way we can avoid confusion with the JEP 483 code for supporting lambdas (and other java.lang.invoke functionalities). Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - added back jfr/jfrEvents.hpp as it is needed by EventClassLoad - Merge branch 'master' into 8352579-refactor-cds-legacy-lambda-optimizations - @calvinccheung comments - @matias9927 offline comments - consolidated two functions with identical names - Fixed infinite recursion compiler warning - Fixed github action build failures - 8352579: Refactor CDS legacy optimization for lambda proxy classes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24145/files - new: https://git.openjdk.org/jdk/pull/24145/files/bd642f8e..1c34d836 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24145&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24145&range=03-04 Stats: 63428 lines in 1797 files changed: 17721 ins; 37446 del; 8261 mod Patch: https://git.openjdk.org/jdk/pull/24145.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24145/head:pull/24145 PR: https://git.openjdk.org/jdk/pull/24145 From ccheung at openjdk.org Wed Mar 26 22:19:11 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 26 Mar 2025 22:19:11 GMT Subject: RFR: 8352579: Refactor CDS legacy optimization for lambda proxy classes [v5] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 22:11:48 GMT, Ioi Lam wrote: >> Since JDK 16, CDS has provided limited optimization for lambda expressions. This has been superseded by JEP 483 and is useful only when `-XX:+AOTClassLinking` is not enabled (which is the case for the default CDS archive, for compatibility reasons). >> >> The "legacy lambda optimization" may eventually be removed. For the time being, we should consolidate the code into a single source code and clearly mark its uses. This way we can avoid confusion with the JEP 483 code for supporting lambdas (and other java.lang.invoke functionalities). > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - added back jfr/jfrEvents.hpp as it is needed by EventClassLoad > - Merge branch 'master' into 8352579-refactor-cds-legacy-lambda-optimizations > - @calvinccheung comments > - @matias9927 offline comments - consolidated two functions with identical names > - Fixed infinite recursion compiler warning > - Fixed github action build failures > - 8352579: Refactor CDS legacy optimization for lambda proxy classes Looks good. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24145#pullrequestreview-2718730762 From jwaters at openjdk.org Wed Mar 26 23:16:20 2025 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 26 Mar 2025 23:16:20 GMT Subject: RFR: 8345265: Minor improvements for LTO across all compilers [v2] In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 10:42:50 GMT, Kim Barrett wrote: >>> Hi, the workaround 'disable lto in g1ParScanThreadState because of special inlining/flattening used there' is removed , why this works now ? >> >> The issue there was the `-Wattribute-warning` warnings that were being generated. But this change is suppressing >> those warnings in the LTO link: >> https://github.com/openjdk/jdk/blame/9d05cb8eff344fea3c6b9a9686b728ec53963978/make/hotspot/lib/JvmFeatures.gmk#L176C11-L176C11 >> Note that won't work with the new portable forbidding mechanism based on `deprecated` attributes. >> >> I'm trying this new version, and I still get a few other warnings and then seem to have a process hang in lto1-ltrans. >> The switch from `-flto=auto` to `-flto=$(JOBS)` doesn't seem to have helped in that respect. > >> I'm trying this new version, and I still get a few other warnings and then seem to have a process hang in lto1-ltrans. The switch from `-flto=auto` to `-flto=$(JOBS)` doesn't seem to have helped in that respect. > > Turns out I didn't wait long enough. It does terminate after about 40 minutes, though not successfully. Instead the > build crashes with a failed assert: > > # Internal Error (../../src/hotspot/share/runtime/handles.inline.hpp:76), pid=4017588, tid=4017620 > # assert(_thread->is_in_live_stack((address)this)) failed: not on stack? > > I've not tried to debug this. Maybe it's a consequence of one of those problems of bypassing an intentional implicit > noinline in our code (by ensuring a function definition is in a different TU from all callers), with LTO breaking that. > Or maybe LTO is breaking us in some other way (such as taking advantage of no-UB constraints that aren't found > by normal compilation). I've been thinking this over, and I think the way forward to deal with G1 exploding HotSpot's size is to find a static analysis tool and use it to analyze g1ParScanThreadState.cpp, to find which methods it calls are from within its compilation unit, and which come from outside, from another compilation unit. Alternatively, if anyone knows which methods exactly should be flattened (Force inlined) and which ones should _not_ be, we could possibly fast track the process to making LTO viable on gcc higher than 13. My logic for prioritizing making LTO viable over fixing it for now is that there should be a motivating reason to fix LTO to work properly, and if LTO causes such massive bloat on any compiler, then it is no longer viable and the reason for fixing it is removed. Maybe @kimbarrett could provide some insight on which methods should be inlined and which should not be? If that's not possible, then hopefully someone has a static analysis tool suggestion for seeing the call graphs of g1ParScanThreadState.cpp ------------- PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2755970938 From iklam at openjdk.org Thu Mar 27 00:27:12 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 27 Mar 2025 00:27:12 GMT Subject: RFR: 8352579: Refactor CDS legacy optimization for lambda proxy classes [v4] In-Reply-To: References: Message-ID: <0t2Pb4p-LHKortLsi0XH0wElB-_Gdb2VPPPDEmhialk=.6e8ae277-bf3e-4852-89d2-7bc374ed2d3d@github.com> On Mon, 24 Mar 2025 16:36:56 GMT, Matias Saavedra Silva wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @matias9927 offline comments - consolidated two functions with identical names > > Thanks for addressing our offline conversation, looks good to me! Thanks @matias9927 and @calvinccheung for the review ------------- PR Comment: https://git.openjdk.org/jdk/pull/24145#issuecomment-2756053508 From iklam at openjdk.org Thu Mar 27 00:27:12 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 27 Mar 2025 00:27:12 GMT Subject: Integrated: 8352579: Refactor CDS legacy optimization for lambda proxy classes In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 06:56:20 GMT, Ioi Lam wrote: > Since JDK 16, CDS has provided limited optimization for lambda expressions. This has been superseded by JEP 483 and is useful only when `-XX:+AOTClassLinking` is not enabled (which is the case for the default CDS archive, for compatibility reasons). > > The "legacy lambda optimization" may eventually be removed. For the time being, we should consolidate the code into a single source code and clearly mark its uses. This way we can avoid confusion with the JEP 483 code for supporting lambdas (and other java.lang.invoke functionalities). This pull request has now been integrated. Changeset: 24833403 Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/24833403b6b93ca464720f00de0e8bd5e1c140be Stats: 1065 lines in 18 files changed: 557 ins; 458 del; 50 mod 8352579: Refactor CDS legacy optimization for lambda proxy classes Reviewed-by: ccheung, matsaave ------------- PR: https://git.openjdk.org/jdk/pull/24145 From sspitsyn at openjdk.org Thu Mar 27 01:15:32 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 27 Mar 2025 01:15:32 GMT Subject: RFR: 8316682: serviceability/jvmti/vthread/SelfSuspendDisablerTest timed out Message-ID: This fixes the issue with lack of synchronization between JVMTI thread suspend and resume functions in a self-suspend case. More detailed fix description is in the first PR comment. Testing: Ran mach5 tiers 1-6. ------------- Commit messages: - 8316682: serviceability/jvmti/vthread/SelfSuspendDisablerTest timed out Changes: https://git.openjdk.org/jdk/pull/24269/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24269&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316682 Stats: 121 lines in 9 files changed: 69 ins; 33 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/24269.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24269/head:pull/24269 PR: https://git.openjdk.org/jdk/pull/24269 From sspitsyn at openjdk.org Thu Mar 27 02:07:07 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 27 Mar 2025 02:07:07 GMT Subject: RFR: 8316682: serviceability/jvmti/vthread/SelfSuspendDisablerTest timed out In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 01:10:54 GMT, Serguei Spitsyn wrote: > This fixes the issue with lack of synchronization between JVMTI thread suspend and resume functions in a self-suspend case. More detailed fix description is in the first PR comment. > > Testing: Ran mach5 tiers 1-6. The fix contains the following updates: - Now the internal function `resume_thread()` is executed in a handshake closure (`JvmtiUnitedHandshakeClosure`). This provides a necessary synchronization with the `suspend_thread()` in a case of self-suspension. It would be even better to execute `suspend_thread()` in a handshake closure as well. But this is harder to make right. It'd still make sense to consider such an update in the future. - The `HandshakeState:resume()` is updated to remove the `MutexLocker` and a duplicated check for `!is_suspended`. - The `JvmtiVTMSTransition_lock` has been replaced with newly introduced `JvmtiVThreadSuspend_lock` in the implementation of the `JvmtiVTSuspender` functions: `register_all_vthreads_suspend()`, `register_all_vthreads_resume()`, `register_vthread_suspend()`, `register_vthread_resume()`. It is because the resume operations are executed in handshakes now under protection of the HanshakeState lock and so, need a higher ranked lock. - The `JvmtiVTMSTransitionDisabler` has several updates: - it does nothing (plays a no-op) if the target virtual thread is executed in a context of current `JavaThread`, so it is trying to disable transitions for itself - One entry is removed from the `ProblemList`. It is related to the bug which is a dup of this one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24269#issuecomment-2756268404 From sspitsyn at openjdk.org Thu Mar 27 02:18:35 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 27 Mar 2025 02:18:35 GMT Subject: RFR: 8316682: serviceability/jvmti/vthread/SelfSuspendDisablerTest timed out [v2] In-Reply-To: References: Message-ID: > This fixes the issue with lack of synchronization between JVMTI thread suspend and resume functions in a self-suspend case. More detailed fix description is in the first PR comment. > > Testing: Ran mach5 tiers 1-6. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: some cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24269/files - new: https://git.openjdk.org/jdk/pull/24269/files/f1fb7905..18944347 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24269&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24269&range=00-01 Stats: 9 lines in 2 files changed: 0 ins; 8 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24269.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24269/head:pull/24269 PR: https://git.openjdk.org/jdk/pull/24269 From dholmes at openjdk.org Thu Mar 27 04:34:07 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Mar 2025 04:34:07 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: <6zNYNSD4GAfIqmDRBRj1a4_Q73C4EeJYv_tn2k0k2Fw=.71f884d2-b82e-4a9c-be4d-bacd49802cbf@github.com> On Wed, 26 Mar 2025 15:11:06 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > address Windows issues If we really think we want this then maybe we should adjust the build process so that this info gets built into the binaries and doesn't require reading from the file system? src/hotspot/share/runtime/arguments.cpp line 3665: > 3663: > 3664: // cache the release file of the JDK image > 3665: os::read_image_release_file(); What is the impact on startup? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2756644666 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2015617670 From dholmes at openjdk.org Thu Mar 27 04:55:06 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Mar 2025 04:55:06 GMT Subject: RFR: 8352762: Use EXACTFMT instead of expanded version where applicable In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 13:59:14 GMT, Joel Sikstr?m wrote: > [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) introduced the EXACTFMT macro, which is a shorthand for printing exact values using methods defined in globalDefinitions.hpp. There are currently 20 places in HotSpot which uses the expanded version of the macro, along with the "trace_page_size_params" macro that is defined and used in os.cpp. > > I have replaced places that use the expanded macro(s) with EXACTFMT + EXACTFMTARGS, and also removed trace_page_size_params from os.cpp, which was essentially a redefnition of EXACTFMTARGS. > > Testing: GHA, tiers 1-4 Paging @tstuefe ! Thomas added EXACTFMT in [JDK-8310233](https://github.com/openjdk/jdk/pull/14739/files#top) and did not use it for some of the places where you are now using it. Despite being a reviewer of Thomas's change, I'm not all sure when EXACTFMT should be used ------------- PR Comment: https://git.openjdk.org/jdk/pull/24228#issuecomment-2756673746 From dholmes at openjdk.org Thu Mar 27 05:03:06 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Mar 2025 05:03:06 GMT Subject: RFR: 8352762: Use EXACTFMT instead of expanded version where applicable In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 13:59:14 GMT, Joel Sikstr?m wrote: > [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) introduced the EXACTFMT macro, which is a shorthand for printing exact values using methods defined in globalDefinitions.hpp. There are currently 20 places in HotSpot which uses the expanded version of the macro, along with the "trace_page_size_params" macro that is defined and used in os.cpp. > > I have replaced places that use the expanded macro(s) with EXACTFMT + EXACTFMTARGS, and also removed trace_page_size_params from os.cpp, which was essentially a redefnition of EXACTFMTARGS. > > Testing: GHA, tiers 1-4 This looks very consistent and reasonable to me. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24228#pullrequestreview-2719664900 From kbarrett at openjdk.org Thu Mar 27 06:14:21 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 27 Mar 2025 06:14:21 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v2] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Wed, 26 Mar 2025 14:23:09 GMT, Doug Simon wrote: >> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> >> By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. >> >> The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. >> >> I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. >> >> When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: >> >> java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: >> >> java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci >> >> at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) >> Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: >> >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp >> /Users/dnsimo... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > drop extra blank lines and preserve rule for first include in .inline.hpp files Changes requested by kbarrett (Reviewer). src/hotspot/share/ci/ciUtilities.inline.hpp line 29: > 27: > 28: #include "ci/ciUtilities.hpp" > 29: Extra blank line not removed? src/hotspot/share/ci/ciUtilities.inline.hpp line 32: > 30: #include "runtime/interfaceSupport.inline.hpp" > 31: > 32: Extra blank line inserted? src/hotspot/share/compiler/compilationFailureInfo.cpp line 35: > 33: #include "compiler/compilationFailureInfo.hpp" > 34: #include "compiler/compileTask.hpp" > 35: #ifdef COMPILER2 Conditional includes are supposed to follow unconditional in a section. Out of scope for this PR? src/hotspot/share/compiler/disassembler.hpp line 36: > 34: #include "utilities/macros.hpp" > 35: > 36: Extra blank line inserted? test/hotspot/jtreg/sources/SortIncludes.java line 39: > 37: > 38: public class SortIncludes { > 39: private static final String INCLUDE_LINE = "^ *#include *(<[^>]+>|\"[^\"]+\") *$\\n"; There are files that have spaces between the `#` and `include`. I'm kind of inclined to suggest we fix those at some point (not in this PR). But the regex here needs to allow for that possibility, and perhaps (eventually) complain about such. test/hotspot/jtreg/sources/SortIncludes.java line 115: > 113: } > 114: > 115: /// Processes the C++ source file in `path` to sort its include statements. If we want to apply this to hotspot jtreg test code, then C source files also come into the picture. test/hotspot/jtreg/sources/SortIncludes.java line 153: > 151: > 152: /// Processes the C++ source files in `paths` to check if their include statements are sorted. > 153: /// Include statements with any non-space characters after the closing `"` or `>` will not Perhaps this should be mentioned in the style guide? ------------- PR Review: https://git.openjdk.org/jdk/pull/24247#pullrequestreview-2719852021 PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2015721384 PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2015718606 PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2015723999 PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2015725371 PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2015706803 PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2015712545 PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2015714360 From kbarrett at openjdk.org Thu Mar 27 06:30:09 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 27 Mar 2025 06:30:09 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v2] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Wed, 26 Mar 2025 14:23:09 GMT, Doug Simon wrote: >> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> >> By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. >> >> The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. >> >> I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. >> >> When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: >> >> java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: >> >> java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci >> >> at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) >> Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: >> >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp >> /Users/dnsimo... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > drop extra blank lines and preserve rule for first include in .inline.hpp files Probably we want to eventually apply this to gtests, but there might be additional rules there. The include of unittest.hpp is (usually) last, and there may be (or may have been) a technical reason for that. Applying it to jtreg test support files could also introduce some challenges. Or at least discover a lot of non-conforming files. We might eventually want a mechanism for excluding directories, in addition to an inclusion list (that might eventually be "all"). These kinds of things can be followups once we have the basic mechanism in place. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24247#issuecomment-2756881833 From rehn at openjdk.org Thu Mar 27 07:33:21 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 27 Mar 2025 07:33:21 GMT Subject: Integrated: 8352218: RISC-V: Zvfh requires RVV In-Reply-To: References: Message-ID: <57YLkso0S6diJbQk0zYCDGrngk1s8lwofBgqufhp-0Q=.63a72b59-8f13-4fd6-9ece-d2424852c584@github.com> On Tue, 18 Mar 2025 09:04:07 GMT, Robbin Ehn wrote: > Hi please consider. > > Added case to turn off UseZvfh when no RVV. > Which is the cause of the test issues, zvfh on but no rvv. > > Also made all case identical and added no warning when default. > Move them to the common init, as the "UseExtension" is not C2 specific. > > Manual tested and some random compiler tests. > > Thanks, Robbin This pull request has now been integrated. Changeset: 78534152 Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/7853415217cc17179abf2e160ca735c936017f4e Stats: 106 lines in 2 files changed: 45 ins; 19 del; 42 mod 8352218: RISC-V: Zvfh requires RVV Reviewed-by: fyang, mli ------------- PR: https://git.openjdk.org/jdk/pull/24094 From stuefe at openjdk.org Thu Mar 27 08:09:09 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 27 Mar 2025 08:09:09 GMT Subject: RFR: 8352762: Use EXACTFMT instead of expanded version where applicable In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 13:59:14 GMT, Joel Sikstr?m wrote: > [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) introduced the EXACTFMT macro, which is a shorthand for printing exact values using methods defined in globalDefinitions.hpp. There are currently 20 places in HotSpot which uses the expanded version of the macro, along with the "trace_page_size_params" macro that is defined and used in os.cpp. > > I have replaced places that use the expanded macro(s) with EXACTFMT + EXACTFMTARGS, and also removed trace_page_size_params from os.cpp, which was essentially a redefnition of EXACTFMTARGS. > > Testing: GHA, tiers 1-4 Neat, thank you. Looks good. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24228#pullrequestreview-2720217267 From bkilambi at openjdk.org Thu Mar 27 08:13:11 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 27 Mar 2025 08:13:11 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations [v2] In-Reply-To: <8QDbenZGakijqUrwAcaVogoJBEiNpzYhN3sDrrteSDk=.d8539631-ab03-45ff-a762-0b6e14c63f89@github.com> References: <8QDbenZGakijqUrwAcaVogoJBEiNpzYhN3sDrrteSDk=.d8539631-ab03-45ff-a762-0b6e14c63f89@github.com> Message-ID: On Tue, 25 Feb 2025 19:45:31 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend for scalar FP16 operations namely - add, subtract, multiply, divide, fma, sqrt, min and max. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Hello @shqking @theRealAph , sincere apologies for the delay in addressing the review comments. I am planning on uploading a patch soon addressing all review comments. Thank you ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23748#issuecomment-2757083553 From stuefe at openjdk.org Thu Mar 27 08:17:22 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 27 Mar 2025 08:17:22 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: <3zkHWLVEELkQkeSU9M0YAOpb3olMDNyU1HAdWUJEm68=.a2d2f9ea-c635-4379-95d7-00ff358eb15f@github.com> <1S9u0e21AfElb6hNR-rLXa7JIhyUBE35ePRt1d4vhfs=.639232d7-aa07-4ac2-96b2-a2a5414c0377@github.com> Message-ID: <3bCMfyyRhqc6WmTZoWKE8kQJhbBJvm2rA2Yn2BSTVww=.56d8efcc-9722-45df-9a9b-87f57ab21696@github.com> On Wed, 26 Mar 2025 16:43:21 GMT, Stefan Karlsson wrote: > > > > I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions. > > > > > > > > > I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error? > > > > > > I second @stefank here. > > Uncommit can fail, ironically, with an ENOMEM : if the uncommit punches a hole into a committed region, this would cause a new new VMA on the kernel-side. This may fail if we run against the limit for VMAs. Forgot what it was, some sysconf setting. All of this is Linux specific, though. > > This happens when we hit the /proc/sys/vm/max_map_count limit, and this immediately crashes the JVM. Yes, but maybe it shouldn't (see below). > > > I don't think this should be unconditionally a fatal error. Since the allocator (whatever it is) can decide to re-commit the region later, and thus "self-heal" itself. > > Is this referring to failures when we hit the max_map_count limit? I'm not convinced that you can recover from that without immediately hitting the same issue somewhere else in the code. Well, you could scrape around for a while and maybe not trigger it. E.g. in Metaspace, I uncommit granules, but that is optional. I could just ignore uncommit errors there. In the heap, we could do the same thing. After a while, the memory may get reused and thus recommitted, thereby solving the problem. I admit this problem is a bit theoretical, and it may be acceptable to (continue to) crash at that point, since other allocations - libc, heap etc - will face the same limit. Running against this limit seems rare in my experiences; we mostly saw it with ZGC in the past. > > Or maybe you are thinking about some of the other reasons for the uncommit to fail? Honestly, I don't know why else uncommit would fail. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2757093258 From dnsimon at openjdk.org Thu Mar 27 08:21:09 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 27 Mar 2025 08:21:09 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v2] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Thu, 27 Mar 2025 05:56:55 GMT, Kim Barrett wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> drop extra blank lines and preserve rule for first include in .inline.hpp files > > test/hotspot/jtreg/sources/SortIncludes.java line 39: > >> 37: >> 38: public class SortIncludes { >> 39: private static final String INCLUDE_LINE = "^ *#include *(<[^>]+>|\"[^\"]+\") *$\\n"; > > There are files that have spaces between the `#` and `include`. I'm kind of inclined to suggest we fix those > at some point (not in this PR). But the regex here needs to allow for that possibility, and perhaps (eventually) > complain about such. Since there are no such cases in the files processed in this PR, I'd suggest not adding support for them. They can be fixed in follow up PRs as the relevant directories are added to `TestIncludesAreSorted.HOTSPOT_SOURCES_TO_CHECK`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2015912061 From shade at openjdk.org Thu Mar 27 08:41:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 08:41:12 GMT Subject: RFR: 8352980: Purge infrastructure for FP-to-bits interpreter intrinsics after 32-bit x86 removal In-Reply-To: References: Message-ID: <7iQ67ikAWlzGvMHHz-wkDX-mbkzGyvqfW8DayYhlMAE=.034503c4-dc47-4b71-af01-cda28ae34555@github.com> On Wed, 26 Mar 2025 15:53:45 GMT, Aleksey Shipilev wrote: > [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373) added the infrastructure to implement `_intBitsToFloat`, `_floatToRawIntBits`, `_longBitsToDouble`, `_doubleToRawLongBits` intrinsics in template interpreters. That work was needed to support NaNs for 32-bit x86, and is no longer needed. The C1/C2 compiler intrinsics for these are still implemented and functional. > > I did not do deep testing, because the removed code is actually dead. > > Additional testing: > - [x] GHA checks for platform builds > - [x] Linux x86_64 server fastdebug, `tier1` Thanks for reviews! @coleenp, @dholmes-ora: this is nominally Runtime, want to ack it as well? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24259#issuecomment-2757194028 From shade at openjdk.org Thu Mar 27 08:45:49 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 08:45:49 GMT Subject: RFR: 8351155: C1/C2: Remove 32-bit x86 specific FP rounding support [v2] In-Reply-To: <8zUrV-sMSOwRSQk_jERtFqjrzOFUP7rlUwTTN7cPP_8=.b1d30fb5-d9c0-4417-bacd-bf09f2af433b@github.com> References: <8zUrV-sMSOwRSQk_jERtFqjrzOFUP7rlUwTTN7cPP_8=.b1d30fb5-d9c0-4417-bacd-bf09f2af433b@github.com> Message-ID: > C1 and C2 have support for rounding double/floats, to support awkward rounding modes of x87 FPU. With 32-bit x86 port removed, we can remove those parts. This basically deletes all the code that uses `strict_fp_requires_explicit_rounding`, which is now universally `false` for all supported platforms. > > For C1, we remove `RoundFP` op, its associated `lir_roundfp` and related utility methods that insert these nodes in the graph. > > For C2, we remove `RoundDouble` and `RoundFloat` nodes (note there is a confusingly named `RoundDoubleMode` nodes that are not related to this), associated utility methods, AD match rules that reference these nodes (as nops!), and some `Ideal`-s that are no longer needed. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Minor leftover ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24250/files - new: https://git.openjdk.org/jdk/pull/24250/files/376c5ad8..88e4589c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24250&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24250&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24250.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24250/head:pull/24250 PR: https://git.openjdk.org/jdk/pull/24250 From stefank at openjdk.org Thu Mar 27 08:46:09 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 27 Mar 2025 08:46:09 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v2] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Wed, 26 Mar 2025 14:23:09 GMT, Doug Simon wrote: >> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> >> By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. >> >> The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. >> >> I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. >> >> When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: >> >> java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: >> >> java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci >> >> at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) >> Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: >> >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp >> /Users/dnsimo... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > drop extra blank lines and preserve rule for first include in .inline.hpp files I ran the latest script over the HotSpot source and see that it messes up corner-cases with our platform includes. diff --git a/src/hotspot/cpu/aarch64/continuationEntry_aarch64.inline.hpp b/src/hotspot/cpu/aarch64/continuationEntry_aarch64.inline.hpp index df4d3957239..e8816767a96 100644 --- a/src/hotspot/cpu/aarch64/continuationEntry_aarch64.inline.hpp +++ b/src/hotspot/cpu/aarch64/continuationEntry_aarch64.inline.hpp @@ -25,10 +25,9 @@ #ifndef CPU_AARCH64_CONTINUATIONENTRY_AARCH64_INLINE_HPP #define CPU_AARCH64_CONTINUATIONENTRY_AARCH64_INLINE_HPP -#include "runtime/continuationEntry.hpp" - #include "code/codeCache.hpp" #include "oops/method.inline.hpp" +#include "runtime/continuationEntry.hpp" #include "runtime/frame.inline.hpp" #include "runtime/registerMap.hpp" The includes are: .hpp --------------> _aarch64.hpp ^ ^ | | | +------------------+ | | .inline.hpp -------> _aarch64.inline.hpp So, continuationEntry.hpp acts like the .hpp file for continuationEntry_aarc64.inline.hpp. Unfortunately, we don't have a fully consistent way to write our platform includes, so I don't know how to codify this in a tool without breaking things. ------------- PR Review: https://git.openjdk.org/jdk/pull/24247#pullrequestreview-2720267338 From stefank at openjdk.org Thu Mar 27 08:46:10 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 27 Mar 2025 08:46:10 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v2] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: <0I5RGRwY9sT2TJDoc1RjzTOck5evkm4-iO2Int7Imqg=.d3d3abce-e771-455f-9de6-cae4781434a1@github.com> On Thu, 27 Mar 2025 06:10:37 GMT, Kim Barrett wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> drop extra blank lines and preserve rule for first include in .inline.hpp files > > src/hotspot/share/compiler/disassembler.hpp line 36: > >> 34: #include "utilities/macros.hpp" >> 35: >> 36: > > Extra blank line inserted? This seems to be left-overs from an earlier run. If I run the tool on this file it doesn't add this blank line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2015915647 From stuefe at openjdk.org Thu Mar 27 08:48:14 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 27 Mar 2025 08:48:14 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: On Fri, 21 Mar 2025 10:15:10 GMT, Martin Doerr wrote: >> Hi Thomas, >> mprotect supports System V shared memory, but only if running in an environment where the MPROTECT_SHM=ON environmental variable is defined, which is not the case in the jdk. So we can fairly say System V shared memory cannot be mprotected by us. >> >> The documentation says: >> _The mprotect subroutine can only be used on shared memory regions backed with 4 KB or 64 KB pages;_ >> So we can mprotect 64K pages and mmap supports 64K pages beginning with AIX 7.3 TL1. >> With JDK-8334371 we favor the use of mmap 64K pages over System V shared memory if running on a system with AIX 7.3 TL1 or higher. But as long as we allow lower os versions the system V shared memory is still in place, and the mprotect restriction stays valid. > > I haven't seen test errors with this new version. @JoKern65, @MBaesken: Are you aware of any problems? Thanks @TheRealMDoerr and others! Anyone willing to give me a second review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2757208833 From kbarrett at openjdk.org Thu Mar 27 09:07:22 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 27 Mar 2025 09:07:22 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v2] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Thu, 27 Mar 2025 08:18:58 GMT, Doug Simon wrote: >> test/hotspot/jtreg/sources/SortIncludes.java line 39: >> >>> 37: >>> 38: public class SortIncludes { >>> 39: private static final String INCLUDE_LINE = "^ *#include *(<[^>]+>|\"[^\"]+\") *$\\n"; >> >> There are files that have spaces between the `#` and `include`. I'm kind of inclined to suggest we fix those >> at some point (not in this PR). But the regex here needs to allow for that possibility, and perhaps (eventually) >> complain about such. > > Since there are no such cases in the files processed in this PR, I'd suggest not adding support for them. They can be fixed in follow up PRs as the relevant directories are added to `TestIncludesAreSorted.HOTSPOT_SOURCES_TO_CHECK`. The regex needs to detect that case eventually anyway, so I think it should be done now. Either we allow that case, in which case the regex must match to work properly where they are present. Or we forbid that case, in which case the regex must match to detect future mistakes even after we've cleaned up existing usage. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2016008497 From stefank at openjdk.org Thu Mar 27 09:17:09 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 27 Mar 2025 09:17:09 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v2] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: <1W8bUhsbNfCXWzdT6QxlegrTNqYo-wxbQHhpzifIFK4=.71382d20-0999-4385-b285-e34936be436c@github.com> On Wed, 26 Mar 2025 14:23:09 GMT, Doug Simon wrote: >> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> >> By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. >> >> The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. >> >> I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. >> >> When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: >> >> java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: >> >> java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci >> >> at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) >> Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: >> >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp >> /Users/dnsimo... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > drop extra blank lines and preserve rule for first include in .inline.hpp files I verified that adding a comment to the end of the `#include "runtime/continuationEntry.hpp"` line leaves that file intact, so I think that is a good enough workaround for the problematic platform includes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24247#issuecomment-2757283373 From mbaesken at openjdk.org Thu Mar 27 09:20:20 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 27 Mar 2025 09:20:20 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: <6zNYNSD4GAfIqmDRBRj1a4_Q73C4EeJYv_tn2k0k2Fw=.71f884d2-b82e-4a9c-be4d-bacd49802cbf@github.com> References: <6zNYNSD4GAfIqmDRBRj1a4_Q73C4EeJYv_tn2k0k2Fw=.71f884d2-b82e-4a9c-be4d-bacd49802cbf@github.com> Message-ID: On Thu, 27 Mar 2025 04:31:55 GMT, David Holmes wrote: > If we really think we want this then maybe we should adjust the build process so that this info gets built into the binaries and doesn't require reading from the file system? This sounds like a good idea , I thought about it too. @magicus what do you say ? For me the valuable info form the release file is the SOURCE hash ; and maybe the list of modules. The other stuff can more or less be found or derived from other hserr/hsinfo data. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2757289174 From duke at openjdk.org Thu Mar 27 09:24:00 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 27 Mar 2025 09:24:00 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v5] In-Reply-To: References: Message-ID: > # Issue Summary > > When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. > > # Change Summary > > Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. > > Concretel, this PR > - adds parse predicate nodes to the IR testing framework, > - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, > - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, > - adds a regression test. > > > # Testing > > The changes passed the following testing: > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) > - tier1 through tier3 and Oracle internal testing Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'master' into JDK-8347449-loop-predicate - Improve help text for UseProfiledLoopPredicate argument - loopnode: cleaner control flow - Clean up IR test - Apply suggestions from @chhagedorn Co-authored-by: Christian Hagedorn - ir-framework: rename new nodes to convention - ir-framework: fix phase for parse predicate nodes - Make conditions on UseProfiledLoopPredicate first test UseLoopPredicate - Turn off UseProfiledLoopPredicate when UseLoopPredicate is turned off - Add regression IR test - ... and 1 more: https://git.openjdk.org/jdk/compare/d9538d7f...72ebfc8e ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24248/files - new: https://git.openjdk.org/jdk/pull/24248/files/ea653995..72ebfc8e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=03-04 Stats: 30579 lines in 68 files changed: 463 ins; 29885 del; 231 mod Patch: https://git.openjdk.org/jdk/pull/24248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24248/head:pull/24248 PR: https://git.openjdk.org/jdk/pull/24248 From duke at openjdk.org Thu Mar 27 09:24:01 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 27 Mar 2025 09:24:01 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v4] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 15:27:39 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from @chhagedorn > > Co-authored-by: Christian Hagedorn Implemented suggestions, merged master and reran testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24248#issuecomment-2757292302 From stefank at openjdk.org Thu Mar 27 09:24:17 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 27 Mar 2025 09:24:17 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v2] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Thu, 27 Mar 2025 09:04:45 GMT, Kim Barrett wrote: >> Since there are no such cases in the files processed in this PR, I'd suggest not adding support for them. They can be fixed in follow up PRs as the relevant directories are added to `TestIncludesAreSorted.HOTSPOT_SOURCES_TO_CHECK`. > > The regex needs to detect that case eventually anyway, so I think it should be done now. Either we allow that > case, in which case the regex must match to work properly where they are present. Or we forbid that case, > in which case the regex must match to detect future mistakes even after we've cleaned up existing usage. To me it seems like a small adjustment fixes this Suggestion: private static final String INCLUDE_LINE = "^ *# *include *(<[^>]+>|"[^"]+") *$\\n"; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2016040674 From duke at openjdk.org Thu Mar 27 09:24:05 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 27 Mar 2025 09:24:05 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v3] In-Reply-To: References: <5OOpDW693XhGVePrz6zYlm9gMZKneFaIfl6BJP-NQb0=.c5757e84-6b14-442b-b822-d0b1eeb5f913@github.com> Message-ID: On Wed, 26 Mar 2025 15:04:15 GMT, Christian Hagedorn wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> ir-framework: rename new nodes to convention > > src/hotspot/share/opto/c2_globals.hpp line 789: > >> 787: product(bool, UseProfiledLoopPredicate, true, \ >> 788: "Move predicates out of loops based on profiling data. " \ >> 789: "Requires UseLoopPredicate to be turned on (default).") \ > > It was already a bit vague before but I suggest to be more precise that we move checks with an uncommon trap out of a loop (and the resulting check before the loop is then a predicate): > > Move checks with an uncommon trap out of loops based on profiling data. Requires [...] Implemented in [f903729](https://github.com/openjdk/jdk/pull/24248/commits/f90372927a4d7ed82740014934f4409648d42bca) > src/hotspot/share/opto/loopnode.cpp line 4304: > >> 4302: tty->print(" profile_predicated"); >> 4303: } >> 4304: if (UseLoopPredicate && predicates.loop_predicate_block()->is_non_empty()) { > > Maybe you can merge these blocks: > > if (UseLoopPredicate) { > if (UseProfiledLoopPredicate && predicates.profiled_loop_predicate_block()->is_non_empty()) { > tty->print(" profile_predicated"); > } > if (predicates.loop_predicate_block()->is_non_empty()) { > tty->print(" predicated"); > } > } Implemented in [1aba1c2](https://github.com/openjdk/jdk/pull/24248/commits/1aba1c23f0e90e0a6717bdf7c441451b8e9c3efc) > test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1524: > >> 1522: public static final String PARSE_PREDICATE_LOOP = PREFIX + "PARSE_PREDICATE_LOOP" + POSTFIX; >> 1523: static { >> 1524: parsePredicateNodes(PARSE_PREDICATE_LOOP, "Loop"); > > I suggest the following names found in `predicates.hpp`: > https://github.com/openjdk/jdk/blob/79bffe2f28f90986d45f4e91efc021290b4fc00a/src/hotspot/share/opto/predicates.hpp#L48-L50 Implemented in [7f24c87](https://github.com/openjdk/jdk/pull/24248/commits/7f24c87557da33bbb96a3596222b4737a06d9d31) > test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 41: > >> 39: static final int WARMUP = 10_000; >> 40: static final int SIZE = 100; >> 41: static final int min = 3; > > Since `min` is also a constant, you should capitalize it. Implemented in [7f24c87](https://github.com/openjdk/jdk/pull/24248/commits/7f24c87557da33bbb96a3596222b4737a06d9d31) > test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 46: > >> 44: TestFramework.runWithFlags("-XX:+UseLoopPredicate", >> 45: "-XX:+UseProfiledLoopPredicate"); >> 46: TestFramework.runWithFlags("-XX:-UseLoopPredicate"); > > You could also add a run where you only disable `-XX:-UseProfiledLoopPredicate` for completeness and add an IR rule accordingly. Implemented in [7f24c87](https://github.com/openjdk/jdk/pull/24248/commits/7f24c87557da33bbb96a3596222b4737a06d9d31) > test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 74: > >> 72: @IR(counts = { IRNode.PARSE_PREDICATE_LOOP, "=1", >> 73: IRNode.PARSE_PREDICATE_PROFILED_LOOP, "1" }, >> 74: phase = CompilePhase.AFTER_PARSING, > > `phase` is not required since you've decided that `AFTER_PARSING` is the default phase where we match this node on. You only need to specify `phase` if you want to match on a different phase. Implemented in [7f24c87](https://github.com/openjdk/jdk/pull/24248/commits/7f24c87557da33bbb96a3596222b4737a06d9d31) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016037388 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016040418 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016039828 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016039142 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016037918 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016038370 From dnsimon at openjdk.org Thu Mar 27 09:43:08 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 27 Mar 2025 09:43:08 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v2] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Thu, 27 Mar 2025 06:26:43 GMT, Kim Barrett wrote: > Probably we want to eventually apply this to gtests, but there might be additional rules there. The include of unittest.hpp is (usually) last, and there may be (or may have been) a technical reason for that. > > Applying it to jtreg test support files could also introduce some challenges. Or at least discover a lot of non-conforming files. We might eventually want a mechanism for excluding directories, in addition to an inclusion list (that might eventually be "all"). > > These kinds of things can be followups once we have the basic mechanism in place. I would suggest someone open issue(s) for follow up enhancements to the tool. I think having something in place now and incrementally improving it and adjusting it for all the special cases makes most sense. > src/hotspot/share/compiler/compilationFailureInfo.cpp line 35: > >> 33: #include "compiler/compilationFailureInfo.hpp" >> 34: #include "compiler/compileTask.hpp" >> 35: #ifdef COMPILER2 > > Conditional includes are supposed to follow unconditional in a section. > Out of scope for this PR? Yep. From the PR description: The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. > test/hotspot/jtreg/sources/SortIncludes.java line 115: > >> 113: } >> 114: >> 115: /// Processes the C++ source file in `path` to sort its include statements. > > If we want to apply this to hotspot jtreg test code, then C source files also come into the picture. I think the tool will need to be updated to handle C source files. At that point, the comment should be generalized. > test/hotspot/jtreg/sources/SortIncludes.java line 153: > >> 151: >> 152: /// Processes the C++ source files in `paths` to check if their include statements are sorted. >> 153: /// Include statements with any non-space characters after the closing `"` or `>` will not > > Perhaps this should be mentioned in the style guide? Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24247#issuecomment-2757350491 PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2016078724 PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2016077938 PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2016078194 From dnsimon at openjdk.org Thu Mar 27 09:49:38 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 27 Mar 2025 09:49:38 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v3] In-Reply-To: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: <4eq6qUl0x1TJxdlM6oWmpAazGCmFsVbzSjY58KFosv0=.c2175ffa-cdc1-4754-a84b-30f0a389397c@github.com> > This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). > > By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. > > The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. > > I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. > > When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: > > java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: > > java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci > > at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) > at java.base/java.lang.Thread.run(Thread.java:1447) > Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: > > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Optim... Doug Simon has updated the pull request incrementally with four additional commits since the last revision: - allow spaces between `#` and `include` - moved some logic out of SortIncludes into TestIncludesAreSorted - removed extra blank lines - update style guide with advice on how to label includes that should not be re-ordered ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24247/files - new: https://git.openjdk.org/jdk/pull/24247/files/18e2a1d6..cada0df4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24247&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24247&range=01-02 Stats: 117 lines in 6 files changed: 60 ins; 29 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/24247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24247/head:pull/24247 PR: https://git.openjdk.org/jdk/pull/24247 From dnsimon at openjdk.org Thu Mar 27 09:49:38 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 27 Mar 2025 09:49:38 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v2] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Thu, 27 Mar 2025 09:20:58 GMT, Stefan Karlsson wrote: >> The regex needs to detect that case eventually anyway, so I think it should be done now. Either we allow that >> case, in which case the regex must match to work properly where they are present. Or we forbid that case, >> in which case the regex must match to detect future mistakes even after we've cleaned up existing usage. > > To me it seems like a small adjustment fixes this > Suggestion: > > private static final String INCLUDE_LINE = "^ *# *include *(<[^>]+>|"[^"]+") *$\\n"; Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2016091219 From chagedorn at openjdk.org Thu Mar 27 10:08:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Mar 2025 10:08:15 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v5] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 09:24:00 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8347449-loop-predicate > - Improve help text for UseProfiledLoopPredicate argument > - loopnode: cleaner control flow > - Clean up IR test > - Apply suggestions from @chhagedorn > > Co-authored-by: Christian Hagedorn > - ir-framework: rename new nodes to convention > - ir-framework: fix phase for parse predicate nodes > - Make conditions on UseProfiledLoopPredicate first test UseLoopPredicate > - Turn off UseProfiledLoopPredicate when UseLoopPredicate is turned off > - Add regression IR test > - ... and 1 more: https://git.openjdk.org/jdk/compare/4130165c...72ebfc8e Thanks for the updates, looks good now! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2720657747 From stefank at openjdk.org Thu Mar 27 10:13:15 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 27 Mar 2025 10:13:15 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v3] In-Reply-To: <4eq6qUl0x1TJxdlM6oWmpAazGCmFsVbzSjY58KFosv0=.c2175ffa-cdc1-4754-a84b-30f0a389397c@github.com> References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> <4eq6qUl0x1TJxdlM6oWmpAazGCmFsVbzSjY58KFosv0=.c2175ffa-cdc1-4754-a84b-30f0a389397c@github.com> Message-ID: On Thu, 27 Mar 2025 09:49:38 GMT, Doug Simon wrote: >> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> >> By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. >> >> The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. >> >> I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. >> >> When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: >> >> java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: >> >> java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci >> >> at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) >> Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: >> >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp >> /Users/dnsimo... > > Doug Simon has updated the pull request incrementally with four additional commits since the last revision: > > - allow spaces between `#` and `include` > - moved some logic out of SortIncludes into TestIncludesAreSorted > - removed extra blank lines > - update style guide with advice on how to label includes that should not be re-ordered I'm happy with the capabilities of the tool now and think that it is good enough to include and promote to HotSpot devs. One questions is where to put the tool? I don't think the test directory is the best place. Maybe somewhere in `src/utils/`. There is a tools dir here `src/utils/src/build/tools/` but I don't know if it is appropriate to put it there. Maybe @magicus knows a good place for this? A couple of nits: 1) jcheck fails because of whitespaces 2) The /// style comments is a style I haven't encountered before. ------------- PR Review: https://git.openjdk.org/jdk/pull/24247#pullrequestreview-2720671629 From dnsimon at openjdk.org Thu Mar 27 10:39:13 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 27 Mar 2025 10:39:13 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v3] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> <4eq6qUl0x1TJxdlM6oWmpAazGCmFsVbzSjY58KFosv0=.c2175ffa-cdc1-4754-a84b-30f0a389397c@github.com> Message-ID: On Thu, 27 Mar 2025 10:10:07 GMT, Stefan Karlsson wrote: > A couple of nits: > > 1. jcheck fails because of whitespaces > 2. The /// style comments is a style I haven't encountered before. I fixed the whitespaces. I can convert the `///` comments if you want - no strong opinion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24247#issuecomment-2757548404 From dnsimon at openjdk.org Thu Mar 27 10:47:14 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 27 Mar 2025 10:47:14 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v3] In-Reply-To: <4eq6qUl0x1TJxdlM6oWmpAazGCmFsVbzSjY58KFosv0=.c2175ffa-cdc1-4754-a84b-30f0a389397c@github.com> References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> <4eq6qUl0x1TJxdlM6oWmpAazGCmFsVbzSjY58KFosv0=.c2175ffa-cdc1-4754-a84b-30f0a389397c@github.com> Message-ID: On Thu, 27 Mar 2025 09:49:38 GMT, Doug Simon wrote: >> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> >> By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. >> >> The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. >> >> I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. >> >> When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: >> >> java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: >> >> java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci >> >> at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) >> Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: >> >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp >> /Users/dnsimo... > > Doug Simon has updated the pull request incrementally with four additional commits since the last revision: > > - allow spaces between `#` and `include` > - moved some logic out of SortIncludes into TestIncludesAreSorted > - removed extra blank lines > - update style guide with advice on how to label includes that should not be re-ordered I just noticed that TestIncludesAreSorted is not run by GHA. How about we move `test/hotspot/jtreg/sources` into `tier1_common`: diff --git a/test/hotspot/jtreg/TEST.groups b/test/hotspot/jtreg/TEST.groups index 71b9e497e25..62b11e73aa0 100644 --- a/test/hotspot/jtreg/TEST.groups +++ b/test/hotspot/jtreg/TEST.groups @@ -139,6 +139,7 @@ serviceability_ttf_virtual = \ -serviceability/jvmti/negative tier1_common = \ + sources \ sanity/BasicVMTest.java \ gtest/GTestWrapper.java \ gtest/LockStackGtests.java \ @@ -619,16 +620,12 @@ tier1_serviceability = \ -serviceability/sa/TestJmapCore.java \ -serviceability/sa/TestJmapCoreMetaspace.java -tier1_sources = \ - sources - tier1 = \ :tier1_common \ :tier1_compiler \ :tier1_gc \ :tier1_runtime \ :tier1_serviceability \ - :tier1_sources tier2 = \ :hotspot_tier2_runtime \ ------------- PR Comment: https://git.openjdk.org/jdk/pull/24247#issuecomment-2757570734 From stefank at openjdk.org Thu Mar 27 11:16:07 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 27 Mar 2025 11:16:07 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v3] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> <4eq6qUl0x1TJxdlM6oWmpAazGCmFsVbzSjY58KFosv0=.c2175ffa-cdc1-4754-a84b-30f0a389397c@github.com> Message-ID: On Thu, 27 Mar 2025 10:36:38 GMT, Doug Simon wrote: > > A couple of nits: > > > > 1. jcheck fails because of whitespaces > > 2. The /// style comments is a style I haven't encountered before. > > I fixed the whitespaces. I can convert the `///` comments if you want - no strong opinion. Maybe someone else knows the preferred style for this? I don't think we need to block the integration because of this. If someone comes late with the proper comment style, we'll update it in a separate PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24247#issuecomment-2757648203 From coleenp at openjdk.org Thu Mar 27 11:22:08 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Mar 2025 11:22:08 GMT Subject: RFR: 8352980: Purge infrastructure for FP-to-bits interpreter intrinsics after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 15:53:45 GMT, Aleksey Shipilev wrote: > [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373) added the infrastructure to implement `_intBitsToFloat`, `_floatToRawIntBits`, `_longBitsToDouble`, `_doubleToRawLongBits` intrinsics in template interpreters. That work was needed to support NaNs for 32-bit x86, and is no longer needed. The C1/C2 compiler intrinsics for these are still implemented and functional. > > I did not do deep testing, because the removed code is actually dead. > > Additional testing: > - [x] GHA checks for platform builds > - [x] Linux x86_64 server fastdebug, `tier1` This is in the interpreter but I don't believe we in runtime area had anything to do with this. The removal looks great! I'm surprised to see an if IA32. I hope these are almost purged too. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24259#pullrequestreview-2720920801 From rehn at openjdk.org Thu Mar 27 11:22:48 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 27 Mar 2025 11:22:48 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: > Hi please consider. > > |RVWMO| Patched| > | ---------- | ---------- | > |fence iorw,iorw| fence iorw,ow| > |sw t4,120(t2) | sw t4,120(t2) | > |fence ow,ir | unnecessary_membar_volatile_rvwmo | > | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | > |fence iorw,ow | fence iorw,ow| > |sw t5,124(t2) |sw t5,124(t2) | > > |TSO | Patched| > | ---------- | ---------- | > | lw a4,120(t2) | lw a6,120(t2) | > | sw a0,124(t2) | sw t6,124(t2) | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > | sw t4,120(t2) | sw t4,120(t2) | > | fence ow,ir | unnecessary_membar_volatile_tso | > | sw t6,128(t2) | sw t5,128(t2) | > | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > |... | ... | > | sw a3,120(t2) | sw a0,120(t2) | > | fence ow,ir | fence ow,ir | > | lw a7,124(t2) | lw a5,124(t2) | > > For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. > > The patch do: > - Separate ztso and rvwmo in ad by using UseZtso predicate. > - Match all that requires the same membar. > - Make fence/fencei protected as they shouldn't be using directly. > - Increased cost of membars to VOLATILE_REF_COST. > - Added a real_empty pipe. > - Change to pipe_slow on TSO (as x86). > > Note that C2-rv64 is now superior to gcc/clang regrading fencing: > https://godbolt.org/z/6E3YTP15j > > Testing jcstress, tier1 and manually reading the generated assembly. > Doing additional testing, but RFR it now as it may need some consideration. > > /Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'master' into tso-merge - Merge branch 'master' into tso-merge - format comment - Merge branch 'master' into tso-merge - Review comments - Merge branch 'master' into tso-merge - Review comments - Fixed ws - Revert NC - Fixed comment - ... and 1 more: https://git.openjdk.org/jdk/compare/15ffecab...c2688a6a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24035/files - new: https://git.openjdk.org/jdk/pull/24035/files/5eac8470..c2688a6a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=06-07 Stats: 32764 lines in 118 files changed: 1605 ins; 30778 del; 381 mod Patch: https://git.openjdk.org/jdk/pull/24035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24035/head:pull/24035 PR: https://git.openjdk.org/jdk/pull/24035 From coleenp at openjdk.org Thu Mar 27 11:23:09 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Mar 2025 11:23:09 GMT Subject: RFR: 8351151: Clean up x86 template interpreter after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 11:51:22 GMT, Aleksey Shipilev wrote: > x86 template interpreter carries `_LP64`-predicated code blocks that were supporting 32-bit x86. With that port gone, we can clean up the x86 template interpreter. I have checked no superfluous `LP64`, `AMD64`, `IA32` defines are left in affected files. > > Where obvious, I inlined `r15_thread` and `c_arg*`. Left the uses that assign these args to symbolic locals that have meaningful names. > > `verify_FPU` is also no-op now, removed that. There are related cleanups in compilers and runtime we need to do first, before we fully remove `VerifyFPU` flag. This change fans out a little to other platform template interpreters to remove `verify_FPU` as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Fred and I chatted about the r15_thread variable too, but agreed that this can be done incrementally later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24251#issuecomment-2757680713 From shade at openjdk.org Thu Mar 27 11:27:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 11:27:22 GMT Subject: RFR: 8352980: Purge infrastructure for FP-to-bits interpreter intrinsics after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 15:53:45 GMT, Aleksey Shipilev wrote: > [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373) added the infrastructure to implement `_intBitsToFloat`, `_floatToRawIntBits`, `_longBitsToDouble`, `_doubleToRawLongBits` intrinsics in template interpreters. That work was needed to support NaNs for 32-bit x86, and is no longer needed. The C1/C2 compiler intrinsics for these are still implemented and functional. > > I did not do deep testing, because the removed code is actually dead. > > Additional testing: > - [x] GHA checks for platform builds > - [x] Linux x86_64 server fastdebug, `tier1` Thanks! Yes, I am going to do another sweep for `IA32` in one of the cleanup umbrella. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24259#issuecomment-2757689272 From shade at openjdk.org Thu Mar 27 11:27:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 11:27:22 GMT Subject: Integrated: 8352980: Purge infrastructure for FP-to-bits interpreter intrinsics after 32-bit x86 removal In-Reply-To: References: Message-ID: <1K7vpBq2Tz6JHsqQLU8g5b0xH-V8xQU3GBHZXWR2DWI=.f04e0a08-ed43-44a5-aac5-1ca1fb4eb24b@github.com> On Wed, 26 Mar 2025 15:53:45 GMT, Aleksey Shipilev wrote: > [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373) added the infrastructure to implement `_intBitsToFloat`, `_floatToRawIntBits`, `_longBitsToDouble`, `_doubleToRawLongBits` intrinsics in template interpreters. That work was needed to support NaNs for 32-bit x86, and is no longer needed. The C1/C2 compiler intrinsics for these are still implemented and functional. > > I did not do deep testing, because the removed code is actually dead. > > Additional testing: > - [x] GHA checks for platform builds > - [x] Linux x86_64 server fastdebug, `tier1` This pull request has now been integrated. Changeset: b7ffd223 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/b7ffd223e83e56259801534b634729c563e36c7b Stats: 95 lines in 11 files changed: 0 ins; 95 del; 0 mod 8352980: Purge infrastructure for FP-to-bits interpreter intrinsics after 32-bit x86 removal Reviewed-by: kvn, vlivanov, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/24259 From shade at openjdk.org Thu Mar 27 11:32:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 11:32:12 GMT Subject: RFR: 8351151: Clean up x86 template interpreter after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 11:51:22 GMT, Aleksey Shipilev wrote: > x86 template interpreter carries `_LP64`-predicated code blocks that were supporting 32-bit x86. With that port gone, we can clean up the x86 template interpreter. I have checked no superfluous `LP64`, `AMD64`, `IA32` defines are left in affected files. > > Where obvious, I inlined `r15_thread` and `c_arg*`. Left the uses that assign these args to symbolic locals that have meaningful names. > > `verify_FPU` is also no-op now, removed that. There are related cleanups in compilers and runtime we need to do first, before we fully remove `VerifyFPU` flag. This change fans out a little to other platform template interpreters to remove `verify_FPU` as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Yeah, I only did `r15_thread` inlinings where it made clear sense to do, e.g. where the `Register thread = r15_thread` definition was right near the use, _and_ where no other symbolic locals were introduced (like `robj`). Otherwise I had lots of hunks with the `rthread` -> `r15_thread` rewrites, which made PR 2x larger. So I think we can indeed touch up those as we go later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24251#issuecomment-2757704378 From varadam at openjdk.org Thu Mar 27 11:49:17 2025 From: varadam at openjdk.org (Varada M) Date: Thu, 27 Mar 2025 11:49:17 GMT Subject: Integrated: 8352393: AIX: Problem list serviceability/attach/AttachAPIv2/StreamingOutputTest.java In-Reply-To: References: Message-ID: <-8Os5D3kHg04nlm1NVS64C22CKuq8ava_bI7oeHBkr8=.923ecbb1-ea29-4b5b-90b1-229fbe58cfcd@github.com> On Wed, 19 Mar 2025 14:56:48 GMT, Varada M wrote: > Excluding the test serviceability/attach/AttachAPIv2/StreamingOutputTest.java > > JBS Issue : [JDK-8352393](https://bugs.openjdk.org/browse/JDK-8352393) This pull request has now been integrated. Changeset: b9907801 Author: Varada M URL: https://git.openjdk.org/jdk/commit/b9907801afaf4c613482ce3cb1b38262ce13df29 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8352393: AIX: Problem list serviceability/attach/AttachAPIv2/StreamingOutputTest.java Reviewed-by: jsjolen, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/24116 From shade at openjdk.org Thu Mar 27 12:02:21 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 12:02:21 GMT Subject: RFR: 8351151: Clean up x86 template interpreter after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 11:51:22 GMT, Aleksey Shipilev wrote: > x86 template interpreter carries `_LP64`-predicated code blocks that were supporting 32-bit x86. With that port gone, we can clean up the x86 template interpreter. I have checked no superfluous `LP64`, `AMD64`, `IA32` defines are left in affected files. > > Where obvious, I inlined `r15_thread` and `c_arg*`. Left the uses that assign these args to symbolic locals that have meaningful names. > > `verify_FPU` is also no-op now, removed that. There are related cleanups in compilers and runtime we need to do first, before we fully remove `VerifyFPU` flag. This change fans out a little to other platform template interpreters to remove `verify_FPU` as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Read the PR again. I see no risky changes, and testing still looks green. I am integrating to re-base some other cleanups on this (GC barriers, https://github.com/openjdk/jdk/pull/24253). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24251#issuecomment-2757783151 From shade at openjdk.org Thu Mar 27 12:02:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 12:02:22 GMT Subject: Integrated: 8351151: Clean up x86 template interpreter after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 11:51:22 GMT, Aleksey Shipilev wrote: > x86 template interpreter carries `_LP64`-predicated code blocks that were supporting 32-bit x86. With that port gone, we can clean up the x86 template interpreter. I have checked no superfluous `LP64`, `AMD64`, `IA32` defines are left in affected files. > > Where obvious, I inlined `r15_thread` and `c_arg*`. Left the uses that assign these args to symbolic locals that have meaningful names. > > `verify_FPU` is also no-op now, removed that. There are related cleanups in compilers and runtime we need to do first, before we fully remove `VerifyFPU` flag. This change fans out a little to other platform template interpreters to remove `verify_FPU` as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: e2cd70aa Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/e2cd70aab69f2244667db91fec5f4e3038f64437 Stats: 1165 lines in 15 files changed: 2 ins; 1040 del; 123 mod 8351151: Clean up x86 template interpreter after 32-bit x86 removal Reviewed-by: coleenp, fparain, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/24251 From shade at openjdk.org Thu Mar 27 12:31:21 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 12:31:21 GMT Subject: RFR: 8351157: Clean up x86 GC barriers after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: <6aXRsWRRGrrJdkmNcZHPw8JBD5piGr6UrmjOdnHjlMY=.3dde2c28-bdfc-4eb1-8d1d-7a4c85d3234f@github.com> > Assembler GC barriers have quite a bit of coding to support 32-bit x86. As 32-bit x86 is removed, we can clean up those parts. > > We can eliminate `!LP64` blocks quite easily. We can also prune passing around `thread` argument, and just trust that `r15_thread` is always available. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' into JDK-8351157-x86-gc-barriers - Also do tlab_allocate - Rely on R15 to be a thread register - Work ------------- Changes: https://git.openjdk.org/jdk/pull/24253/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24253&range=01 Stats: 543 lines in 20 files changed: 1 ins; 426 del; 116 mod Patch: https://git.openjdk.org/jdk/pull/24253.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24253/head:pull/24253 PR: https://git.openjdk.org/jdk/pull/24253 From duke at openjdk.org Thu Mar 27 12:40:29 2025 From: duke at openjdk.org (Zihao Lin) Date: Thu, 27 Mar 2025 12:40:29 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v3] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - 8344116: C2: remove slice parameter from LoadNode::make ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/f4ef46dc..08c1a382 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=01-02 Stats: 3892 lines in 94 files changed: 1545 ins; 2033 del; 314 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From mbaesken at openjdk.org Thu Mar 27 12:52:15 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 27 Mar 2025 12:52:15 GMT Subject: RFR: 8345265: Minor improvements for LTO across all compilers [v2] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 14:54:03 GMT, Julian Waters wrote: >> This is a general cleanup and improvement of LTO, as well as a quick fix to remove a workaround in the Makefiles that disabled LTO for g1ParScanThreadState.cpp due to the old poisoning mechanism causing trouble. The -Wno-attribute-warning change here can be removed once Kim's new poisoning solution is integrated. >> >> - -fno-omit-frame-pointer is added to gcc to stop the linker from emitting code without the frame pointer >> - -flto is set to $(JOBS) instead of auto to better match what the user requested >> - -Gy is passed to the Microsoft compiler. This does not fully fix LTO under Microsoft, but prevents warnings about -LTCG:INCREMENTAL at least > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'openjdk:master' into patch-16 > - -fno-omit-frame-pointer in JvmFeatures.gmk > - Revert compilerWarnings_gcc.hpp > - General LTO fixes JvmFeatures.gmk > - Revert DISABLE_POISONING_STOPGAP compilerWarnings_gcc.hpp > - Merge branch 'openjdk:master' into patch-16 > - Revert os.cpp > - Fix memory leak in jvmciEnv.cpp > - Stopgap fix in os.cpp > - Declaration fix in compilerWarnings_gcc.hpp > - ... and 2 more: https://git.openjdk.org/jdk/compare/06ed9958...9d05cb8e I did some builds today with gcc14.2.0 on Linux x86_64 . They looked good with and without LTO . Without lto / normal opt build du -sh images/jdk/lib/server/libjvm.so 27M images/jdk/lib/server/libjvm.so WITH lto du -sh images/jdk/lib/server/libjvm.so 24M images/jdk/lib/server/libjvm.so So even the code size reduction is visible. So gcc14 seems to work for me (but so far I only used it on Linux x86_64, can't tell about aarch64/ppc64le). ------------- PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2757921798 From cnorrbin at openjdk.org Thu Mar 27 13:14:13 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 27 Mar 2025 13:14:13 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream In-Reply-To: References: Message-ID: <8Wwv81jleabUql4kGXnO_nxIrOwYHCM7Tp9mZXNc5Nk=.30301f9a-02b4-49be-a478-c881e0bc66d0@github.com> On Fri, 21 Mar 2025 15:38:21 GMT, Casper Norrbin wrote: > Hi everyone, > > This PR removes redundant `ResourceMark` instances where `LogStream` is used. Previously, `LogStream` inherited from `ResourceObj`, which required a `ResourceMark`, but this is no longer the case, making these instances unnecessary. > > Process: > 1. I added assertions to check for resource unwinding in places where `ResourceMark`s were used with `LogStream`s. > 2. Ran tests up to tier7 to confirm no unwinding was happening. This helped filter out cases where `ResourceMark`s were still required for other reasons. > 3. Manually verified the remaining cases by tracing function calls to ensure the `ResourceMark`s were truly unnecessary. > 4. Removed the redundant `ResourceMark` instances. Thank you both! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24162#issuecomment-2757980629 From duke at openjdk.org Thu Mar 27 13:14:14 2025 From: duke at openjdk.org (duke) Date: Thu, 27 Mar 2025 13:14:14 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 15:38:21 GMT, Casper Norrbin wrote: > Hi everyone, > > This PR removes redundant `ResourceMark` instances where `LogStream` is used. Previously, `LogStream` inherited from `ResourceObj`, which required a `ResourceMark`, but this is no longer the case, making these instances unnecessary. > > Process: > 1. I added assertions to check for resource unwinding in places where `ResourceMark`s were used with `LogStream`s. > 2. Ran tests up to tier7 to confirm no unwinding was happening. This helped filter out cases where `ResourceMark`s were still required for other reasons. > 3. Manually verified the remaining cases by tracing function calls to ensure the `ResourceMark`s were truly unnecessary. > 4. Removed the redundant `ResourceMark` instances. @caspernorrbin Your change (at version 736db7f3722d8cf0e08ee0988e08e2545d829899) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24162#issuecomment-2757985500 From dnsimon at openjdk.org Thu Mar 27 13:21:55 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 27 Mar 2025 13:21:55 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v4] In-Reply-To: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: > This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). > > By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. > > The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. > > I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. > > When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: > > java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: > > java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci > > at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) > at java.base/java.lang.Thread.run(Thread.java:1447) > Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: > > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Optim... Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - moved test/hotspot/jtreg/sources into tier1_common - remove trailing spaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24247/files - new: https://git.openjdk.org/jdk/pull/24247/files/cada0df4..93770e71 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24247&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24247&range=02-03 Stats: 7 lines in 2 files changed: 1 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24247/head:pull/24247 PR: https://git.openjdk.org/jdk/pull/24247 From ihse at openjdk.org Thu Mar 27 13:38:19 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 27 Mar 2025 13:38:19 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v3] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> <4eq6qUl0x1TJxdlM6oWmpAazGCmFsVbzSjY58KFosv0=.c2175ffa-cdc1-4754-a84b-30f0a389397c@github.com> Message-ID: <8Fj5Ui2g5uviJDr4x5rqLaoODjxjfxY_SWqbxXojSlI=.47630310-20d6-40e3-aab9-bce915bc04ad@github.com> On Thu, 27 Mar 2025 10:10:07 GMT, Stefan Karlsson wrote: > One questions is where to put the tool? I don't think the test directory is the best place. Maybe somewhere in src/utils/. There is a tools dir here src/utils/src/build/tools/ but I don't know if it is appropriate to put it there. Maybe @magicus knows a good place for this? I would actually recommend just the `bin` directory. This is , after all, intended to be run as a simple script (remember, it was originally a python script), in a similar vein to the already existing `blessed-modifier-order.sh` script. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24247#issuecomment-2758060262 From ihse at openjdk.org Thu Mar 27 13:38:20 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 27 Mar 2025 13:38:20 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v3] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> <4eq6qUl0x1TJxdlM6oWmpAazGCmFsVbzSjY58KFosv0=.c2175ffa-cdc1-4754-a84b-30f0a389397c@github.com> Message-ID: On Thu, 27 Mar 2025 11:13:27 GMT, Stefan Karlsson wrote: > The /// style comments is a style I haven't encountered before. This is for the new markdown comments. Personally, I very much prefer them and have been looking forward to these for a long time. But I don't know if we have any policy for or against those in the JDK. Using them in a script like this seems fine to me, at any rate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24247#issuecomment-2758064273 From ihse at openjdk.org Thu Mar 27 13:43:20 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 27 Mar 2025 13:43:20 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 15:11:06 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > address Windows issues My general opinion is that we should include as little information about stuff like version strings, git hashes, etc in any built binary. It just tends to trigger unnecessary rebuild chains in incremental builds, and make reproducibility harder. I'm not even sure what to do with this one. The file is too large to be passed in its entirety as a define. So then we'd have to add yet another gensrc rule for hotspot, to integrate the contents of this file into a generated .c file. And that would mean hotspot needs to be rebuilt every time the release file changes. TL;DR: I think that would be a bad idea. I suggested the solution currently in the patch, where the file is read at startup. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2758084416 From mbaesken at openjdk.org Thu Mar 27 13:45:27 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 27 Mar 2025 13:45:27 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v3] In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: On Fri, 14 Mar 2025 09:20:40 GMT, Thomas Stuefe wrote: >> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. >> >> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). >> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). >> >> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . >> >> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. >> >> Tests: >> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths >> - SAP reports all tests green (they had reported errors with the previous version) >> - Oracle Tests ongoing >> - GHAs green > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use > - skip test if we have no COH archive > - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use > - aix fix > - test and aix exclusion > - Fix windows when ArchiveRelocationMode=0 or 2 > - original src/hotspot/share/cds/archiveBuilder.cpp line 329: > 327: if (CDSConfig::is_dumping_static_archive()) { > 328: _current_dump_region = &_pz_region; > 329: _current_dump_region->init(&_shared_rs, &_shared_vs); Second line in 'if' and 'else' seems to be identical ? src/hotspot/share/cds/archiveBuilder.cpp line 332: > 330: } else { > 331: _current_dump_region = &_rw_region; > 332: _current_dump_region->init(&_shared_rs, &_shared_vs); Second line in 'if' and 'else' seems to be identical ? src/hotspot/share/cds/archiveUtils.cpp line 85: > 83: > 84: // The number of bits used by the rw/ro ptrmaps. We might have lots of zero > 85: // bits at the bottom and top of rrw/ro ptrmaps, but these zeros will be What means rrw here ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23912#discussion_r2016656605 PR Review Comment: https://git.openjdk.org/jdk/pull/23912#discussion_r2016657281 PR Review Comment: https://git.openjdk.org/jdk/pull/23912#discussion_r2016650933 From mbaesken at openjdk.org Thu Mar 27 13:51:17 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 27 Mar 2025 13:51:17 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v3] In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: On Fri, 14 Mar 2025 09:20:40 GMT, Thomas Stuefe wrote: >> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. >> >> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). >> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). >> >> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . >> >> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. >> >> Tests: >> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths >> - SAP reports all tests green (they had reported errors with the previous version) >> - Oracle Tests ongoing >> - GHAs green > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use > - skip test if we have no COH archive > - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use > - aix fix > - test and aix exclusion > - Fix windows when ArchiveRelocationMode=0 or 2 > - original src/hotspot/share/cds/metaspaceShared.cpp line 1342: > 1340: archive_space_rs = {}; > 1341: // The protection zone is part of the archive: > 1342: // See comment above, the windows way of loading CDS is to mmap the individual If it is about MS Windows, better write 'Windows' not windows src/hotspot/share/cds/metaspaceShared.cpp line 1343: > 1341: // The protection zone is part of the archive: > 1342: // See comment above, the windows way of loading CDS is to mmap the individual > 1343: // parts of the archive into the adress region we just vacated. The protection Typo? adress -> address ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23912#discussion_r2016695632 PR Review Comment: https://git.openjdk.org/jdk/pull/23912#discussion_r2016682159 From duke at openjdk.org Thu Mar 27 13:54:34 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Thu, 27 Mar 2025 13:54:34 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: <5wBQqxybptneJjhR5usfrqg3PJ7G2PB_sDjUkb4BObM=.fe04a403-64ad-4dc5-b793-b48da01acfd4@github.com> References: <3zkHWLVEELkQkeSU9M0YAOpb3olMDNyU1HAdWUJEm68=.a2d2f9ea-c635-4379-95d7-00ff358eb15f@github.com> <5wBQqxybptneJjhR5usfrqg3PJ7G2PB_sDjUkb4BObM=.fe04a403-64ad-4dc5-b793-b48da01acfd4@github.com> Message-ID: On Wed, 26 Mar 2025 20:18:43 GMT, Stefan Karlsson wrote: > > > > > I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today. > > > > > > > > > > > > I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions. > > > > > > > > > I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error? > > > > > > [`VirtualSpace::shrink_by`](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/memory/virtualspace.cpp#L373) allows uncommit to fail without crashing. I'm not certain of the intention behind that. But it seems like it's because shrinking is an optimization and not always critical that it be done immediately. [[1](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/gc/serial/tenuredGeneration.cpp#L258)] > > The above example shows code that assumes that it is OK to fail uncommitting and continuing. I'm trying to figure it that assumption is true. So, what I meant was that I was looking for a concrete example of a failure mode of uncommit that would be an acceptable (safe) failure to continue executing from. That is, a valid failure that don't mess up the memory in an unpredictable/unknowable way. So release/uncommit (via mmap,munmap, VirtualFree) could fail due to: ? Bad arguments, or ? The OS encountered an issue out of control of the JVM. ? JVM bug. Reasonable to fatally fail here. Or the caller could be intentionally passing arguments that may or may not be valid. I don't think there is any code like that currently. ? The only errors that aren't due to bad arugments are ENOMEM and ones related to file descriptors (which are not applicable to uncommit). VirtualFree only fails due to bad arguments according to windows docs. So if there's consensus that ENOMEM is not recoverable (or rare enough to not worry about), then it seems like its OK to fatally fail in all scenarios. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2758139261 From mbaesken at openjdk.org Thu Mar 27 13:57:19 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 27 Mar 2025 13:57:19 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v3] In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: <7tjtKJt2pPyHmkj87trDR_1c2sYaYrp-hyA6eqoHSP0=.33e1ac07-c4d3-4fbf-b264-41bb9afd8a51@github.com> On Fri, 14 Mar 2025 09:20:40 GMT, Thomas Stuefe wrote: >> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. >> >> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). >> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). >> >> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . >> >> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. >> >> Tests: >> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths >> - SAP reports all tests green (they had reported errors with the previous version) >> - Oracle Tests ongoing >> - GHAs green > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use > - skip test if we have no COH archive > - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use > - aix fix > - test and aix exclusion > - Fix windows when ArchiveRelocationMode=0 or 2 > - original src/hotspot/share/cds/metaspaceShared.cpp line 1438: > 1436: ); > 1437: } else { > 1438: // Let JVM freely chose encoding base and shift Do you mean choose ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23912#discussion_r2016723315 From epeter at openjdk.org Thu Mar 27 14:07:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Mar 2025 14:07:20 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v5] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 09:24:00 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8347449-loop-predicate > - Improve help text for UseProfiledLoopPredicate argument > - loopnode: cleaner control flow > - Clean up IR test > - Apply suggestions from @chhagedorn > > Co-authored-by: Christian Hagedorn > - ir-framework: rename new nodes to convention > - ir-framework: fix phase for parse predicate nodes > - Make conditions on UseProfiledLoopPredicate first test UseLoopPredicate > - Turn off UseProfiledLoopPredicate when UseLoopPredicate is turned off > - Add regression IR test > - ... and 1 more: https://git.openjdk.org/jdk/compare/412d134a...72ebfc8e Thanks for working on this! You could also clean up the `IdealKit::loop`, which checks `UseLoopPredicate` only to call `add_parse_predicates`, which adds all predicates... and so it constrains too many things now. src/hotspot/share/opto/c2_globals.hpp line 790: > 788: "Move checks with an uncommon trap out of loops based on " \ > 789: "profiling data. " \ > 790: "Requires UseLoopPredicate to be turned on (default).") \ Can you also update the comment for `UseLoopPredicate`? It seems outdated / wrong. Now is: `Generate a predicate to select fast/slow loop versions` @chhagedorn do you have a good suggestion for what to put now? ------------- PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2721665164 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016743798 From jwaters at openjdk.org Thu Mar 27 14:07:25 2025 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 27 Mar 2025 14:07:25 GMT Subject: RFR: 8345265: Minor improvements for LTO across all compilers [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 12:49:10 GMT, Matthias Baesken wrote: > I did some builds today with gcc14.2.0 on Linux x86_64 . They looked good with and without LTO . Without lto / normal opt build du -sh images/jdk/lib/server/libjvm.so 27M images/jdk/lib/server/libjvm.so > > WITH lto du -sh images/jdk/lib/server/libjvm.so 24M images/jdk/lib/server/libjvm.so > > So even the code size reduction is visible. So gcc14 seems to work for me (but so far I only used it on Linux x86_64, can't tell about aarch64/ppc64le). Great, so it's yet another case of "Compiler works on one platform and royally ***** up on another" again. Sigh. I just love it when that happens. Thanks for the report on the gcc 14 Linux JVM sizes though, at least I can narrow it down with your help. I'll have to think of a way to fix this going forward ------------- PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2758178119 From dnsimon at openjdk.org Thu Mar 27 14:11:07 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 27 Mar 2025 14:11:07 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v3] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> <4eq6qUl0x1TJxdlM6oWmpAazGCmFsVbzSjY58KFosv0=.c2175ffa-cdc1-4754-a84b-30f0a389397c@github.com> Message-ID: On Thu, 27 Mar 2025 10:44:48 GMT, Doug Simon wrote: > I just noticed that TestIncludesAreSorted is not run by GHA. How about we move `test/hotspot/jtreg/sources` into `tier1_common`: I went ahead and pushed this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24247#issuecomment-2758187724 From dnsimon at openjdk.org Thu Mar 27 14:11:07 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 27 Mar 2025 14:11:07 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v5] In-Reply-To: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: > This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). > > By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. > > The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. > > I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. > > When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: > > java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: > > java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci > > at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) > at java.base/java.lang.Thread.run(Thread.java:1447) > Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: > > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Optim... Doug Simon has updated the pull request incrementally with one additional commit since the last revision: moved error message into UnsortedIncludesException ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24247/files - new: https://git.openjdk.org/jdk/pull/24247/files/93770e71..c93e6646 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24247&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24247&range=03-04 Stats: 13 lines in 2 files changed: 8 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24247/head:pull/24247 PR: https://git.openjdk.org/jdk/pull/24247 From dnsimon at openjdk.org Thu Mar 27 14:17:34 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 27 Mar 2025 14:17:34 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v3] In-Reply-To: <8Fj5Ui2g5uviJDr4x5rqLaoODjxjfxY_SWqbxXojSlI=.47630310-20d6-40e3-aab9-bce915bc04ad@github.com> References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> <4eq6qUl0x1TJxdlM6oWmpAazGCmFsVbzSjY58KFosv0=.c2175ffa-cdc1-4754-a84b-30f0a389397c@github.com> <8Fj5Ui2g5uviJDr4x5rqLaoODjxjfxY_SWqbxXojSlI=.47630310-20d6-40e3-aab9-bce915bc04ad@github.com> Message-ID: On Thu, 27 Mar 2025 13:34:02 GMT, Magnus Ihse Bursie wrote: > I would actually recommend just the bin directory. Fine by me but I'm not sure how to then use `bin/SortIncludes.java` in `test/hotspot/jtreg/sources/TestIncludesAreSorted.java`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24247#issuecomment-2758217497 From jwaters at openjdk.org Thu Mar 27 14:20:28 2025 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 27 Mar 2025 14:20:28 GMT Subject: RFR: 8345265: Minor improvements for LTO across all compilers [v2] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 14:54:03 GMT, Julian Waters wrote: >> This is a general cleanup and improvement of LTO, as well as a quick fix to remove a workaround in the Makefiles that disabled LTO for g1ParScanThreadState.cpp due to the old poisoning mechanism causing trouble. The -Wno-attribute-warning change here can be removed once Kim's new poisoning solution is integrated. >> >> - -fno-omit-frame-pointer is added to gcc to stop the linker from emitting code without the frame pointer >> - -flto is set to $(JOBS) instead of auto to better match what the user requested >> - -Gy is passed to the Microsoft compiler. This does not fully fix LTO under Microsoft, but prevents warnings about -LTCG:INCREMENTAL at least > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'openjdk:master' into patch-16 > - -fno-omit-frame-pointer in JvmFeatures.gmk > - Revert compilerWarnings_gcc.hpp > - General LTO fixes JvmFeatures.gmk > - Revert DISABLE_POISONING_STOPGAP compilerWarnings_gcc.hpp > - Merge branch 'openjdk:master' into patch-16 > - Revert os.cpp > - Fix memory leak in jvmciEnv.cpp > - Stopgap fix in os.cpp > - Declaration fix in compilerWarnings_gcc.hpp > - ... and 2 more: https://git.openjdk.org/jdk/compare/62288a76...9d05cb8e Wait, sorry to trouble you further, but what does nm --demangle --reverse-sort --print-size --size-sort libjvm.so on HotSpot compiled by gcc 14 with LTO active yield as the largest symbol in the binary? (It should be the symbol listed at the very top) For me, I get the following as the largest symbols, because G1ParScanThreadState contains methods that are flattened to hell with LTO active: 0000000296f9b0c0 0000000000642d40 T G1ParScanThreadState::trim_queue_to_threshold(unsigned int) [clone .constprop.0] 0000000295125480 0000000000630080 T G1ParScanThreadState::trim_queue_to_threshold(unsigned int) 0000000296c68440 0000000000331c80 T G1ParScanThreadState::steal_and_trim_queue(GenericTaskQueueSet, (MemTag)5>*) [clone .constprop.0] 0000000295755500 0000000000331bc0 T G1ParScanThreadState::steal_and_trim_queue(GenericTaskQueueSet, (MemTag)5>*) 0000000295a87a80 000000000017cd00 T G1ParScanThreadState::copy_to_survivor_space(G1HeapRegionAttr, oopDesc*, markWord) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2758224080 PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2758228826 From chagedorn at openjdk.org Thu Mar 27 14:30:17 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Mar 2025 14:30:17 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v5] In-Reply-To: References: Message-ID: <01JvdutO8geXQM1nMA6lw-SeC-bNIiApSPykfxnDZls=.3973b39b-7a36-44fa-8c13-91c02268c986@github.com> On Thu, 27 Mar 2025 14:01:47 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8347449-loop-predicate >> - Improve help text for UseProfiledLoopPredicate argument >> - loopnode: cleaner control flow >> - Clean up IR test >> - Apply suggestions from @chhagedorn >> >> Co-authored-by: Christian Hagedorn >> - ir-framework: rename new nodes to convention >> - ir-framework: fix phase for parse predicate nodes >> - Make conditions on UseProfiledLoopPredicate first test UseLoopPredicate >> - Turn off UseProfiledLoopPredicate when UseLoopPredicate is turned off >> - Add regression IR test >> - ... and 1 more: https://git.openjdk.org/jdk/compare/44c209f7...72ebfc8e > > src/hotspot/share/opto/c2_globals.hpp line 790: > >> 788: "Move checks with an uncommon trap out of loops based on " \ >> 789: "profiling data. " \ >> 790: "Requires UseLoopPredicate to be turned on (default).") \ > > Can you also update the comment for `UseLoopPredicate`? It seems outdated / wrong. > > Now is: > `Generate a predicate to select fast/slow loop versions` > > @chhagedorn do you have a good suggestion for what to put now? Good catch! It could be similar but without mentioning profiling data: Move checks with an uncommon trap out of loops. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016804557 From cnorrbin at openjdk.org Thu Mar 27 14:31:35 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 27 Mar 2025 14:31:35 GMT Subject: Integrated: 8294954: Remove superfluous ResourceMarks when using LogStream In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 15:38:21 GMT, Casper Norrbin wrote: > Hi everyone, > > This PR removes redundant `ResourceMark` instances where `LogStream` is used. Previously, `LogStream` inherited from `ResourceObj`, which required a `ResourceMark`, but this is no longer the case, making these instances unnecessary. > > Process: > 1. I added assertions to check for resource unwinding in places where `ResourceMark`s were used with `LogStream`s. > 2. Ran tests up to tier7 to confirm no unwinding was happening. This helped filter out cases where `ResourceMark`s were still required for other reasons. > 3. Manually verified the remaining cases by tracing function calls to ensure the `ResourceMark`s were truly unnecessary. > 4. Removed the redundant `ResourceMark` instances. This pull request has now been integrated. Changeset: 89e5c42d Author: Casper Norrbin Committer: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/89e5c42d909344d75266a203d7e6b6bb1ad4aea6 Stats: 40 lines in 26 files changed: 1 ins; 39 del; 0 mod 8294954: Remove superfluous ResourceMarks when using LogStream Reviewed-by: dholmes, jsjolen ------------- PR: https://git.openjdk.org/jdk/pull/24162 From rehn at openjdk.org Thu Mar 27 14:32:18 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 27 Mar 2025 14:32:18 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user [v2] In-Reply-To: References: Message-ID: > Hi, for you to consider. > > These tests constantly fails in qemu-user. > Either the require host to be same arch or they are very very slow in emulation. > E.g. "ptrace(PTRACE_ATTACH, ..) failed for 405157: Function not implemented'" for SA tests. > This is the initial set of tests, there are many more, but I need to do some more verification for those. > > From bug: >> qemu-user/rv64 sets uarch to "qemu" in /proc/cpuinfo (qemu-system do not do that). >> We add this uarch to CPU feature string. >> This means we can use jtreg 'require' with cpu string to filter out tests in qemu-user. > > Relevant qemu code: > https://github.com/qemu/qemu/blob/170825d14d88a1ce7fae98d5a928480f2f329b22/linux-user/riscv/target_proc.h#L29 > > Relevant hotspot code: > https://github.com/openjdk/jdk/blob/fa0b18bfde38ee2ffbab33a9eaac547fe8aa3c7c/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp#L250 > > Tested that the require only filters out tests in qemu+riscv64. > > Thanks! > > /Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into qemu-user-issues - more - more - native or very long ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24229/files - new: https://git.openjdk.org/jdk/pull/24229/files/74a74f4c..965424ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24229&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24229&range=00-01 Stats: 48681 lines in 1482 files changed: 10853 ins; 33181 del; 4647 mod Patch: https://git.openjdk.org/jdk/pull/24229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24229/head:pull/24229 PR: https://git.openjdk.org/jdk/pull/24229 From mbaesken at openjdk.org Thu Mar 27 14:35:14 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 27 Mar 2025 14:35:14 GMT Subject: RFR: 8345265: Minor improvements for LTO across all compilers [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 14:16:38 GMT, Julian Waters wrote: > Wait, sorry to trouble you further, but what does nm --demangle --reverse-sort --print-size --size-sort libjvm.so on HotSpot compiled by gcc 14 with LTO active yield as the largest symbol in the binary? (It should be the symbol listed at the very top) This is my output; maybe I have to add I used the 'normal' jdk head without patches, is that what I should do for a gcc14 build test? nm --demangle --reverse-sort --print-size --size-sort images/jdk/lib/server/libjvm.so | more 0000000000453ee0 000000000002320e t State::MachNodeGenerator(int) 0000000000970f70 0000000000018eb9 t CompilerToVM::initialize_intrinsics(JVMCIEnv*) 000000000140caa0 000000000000f018 b Matcher::mreg2regmask 0000000000993c80 000000000000a40d t JNIJVMCI::initialize_ids(JNIEnv_*) 0000000000ac16b0 0000000000009d16 t Matcher::Fixup_Save_On_Entry() 000000000143db00 0000000000008000 b _ZL9_elements.lto_priv.0 0000000001446d20 0000000000008000 b _free_list 000000000141e900 0000000000007d00 b DFSClosure::_reference_stack 00000000013d6d40 0000000000007668 d _ZL9flagTable.lto_priv.0 00000000013edf60 0000000000006c30 d VMStructs::localHotSpotVMStructs 00000000010463f0 0000000000006a06 t readConfiguration0(JNIEnv_*, JVMCIEnv*) [clone .isra.0] 0000000000d51dc0 00000000000067a2 t StubGenerator::generate_libmPow() 00000000010b12d0 0000000000006289 t G1ParScanThreadState::trim_queue_to_threshold(unsigned int) 0000000000e24550 00000000000061e8 t ClassVerifier::verify_method(methodHandle const&, JavaThread*) 0000000001076dd0 000000000000548d t State::DFA(int, Node const*) [clone .isra.0] 0000000000e1ba00 000000000000519d t VMError::report(outputStream*, bool) 00000000014521e0 0000000000005000 b TemplateInterpreter::_safept_table 000000000142cb60 0000000000005000 b TemplateInterpreter::_normal_table 0000000001431b60 0000000000005000 b TemplateInterpreter::_active_table 0000000000653790 0000000000004e12 t CompileBroker::print_heapinfo(outputStream*, char const*, unsigned long) 000000000075be40 0000000000004e0b t G1CollectedHeap::do_collection_pause_at_safepoint_helper() 0000000000b90260 0000000000004a41 t Parse::do_one_bytecode() [clone .part.0] 00000000010c1f20 00000000000049e6 t d_print_comp_inner 0000000000c1e180 0000000000004594 t ServiceThread::service_thread_entry(JavaThread*, JavaThread*) 00000000005ad7c0 0000000000004424 t C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) 0000000000d260b0 000000000000440a t PhaseStringOpts::replace_string_concat(StringConcat*) 0000000000e48920 00000000000042e5 t VM_Version::initialize() 0000000000afbad0 0000000000004240 t Method::init_intrinsic_id(vmSymbolID) 00000000010200d0 00000000000040ab t PSParallelCompact::invoke_no_policy(bool) [clone .isra.0] 000000000051f230 0000000000004054 t Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) 000000000065c060 0000000000004015 t Compile::Code_Gen() 0000000000663060 0000000000003fec t CompileBroker::compiler_thread_loop() 000000000070d9b0 0000000000003fd7 t ConnectionGraph::do_analysis(Compile*, PhaseIterGVN*) 0000000000be8890 0000000000003f8e t PhaseChaitin::Split(unsigned int, ResourceArea*) 0000000000818730 0000000000003f82 t PhaseChaitin::build_ifg_physical(ResourceArea*) 0000000000fb1920 0000000000003f44 t SharedRuntime::generate_native_wrapper(MacroAssembler*, methodHandle const&, int, BasicType*, VMRegPair*, BasicType) [clone .constprop.0] 00000000007e2ab0 0000000000003f41 t PhaseCFG::global_code_motion() 00000000013e8580 0000000000003e10 d JVMCIVMStructs::localHotSpotVMStructs 0000000000db3b00 0000000000003dff t TemplateInterpreterGenerator::generate_all() 0000000000f436d0 0000000000003d92 t initialize_stubs(StubGenBlobId, int, int, char const*, char const*, char const*) [clone .constprop.0] 0000000000d6b680 0000000000003d42 t StubGenerator::generate_libmTan() 00000000004ff560 0000000000003d28 t BCEscapeAnalyzer::iterate_blocks(Arena*) 0000000000a2bfb0 0000000000003cef t VM_RedefineClasses::load_new_class_versions() [clone .part.0] 000000000073fde0 0000000000003ca2 t G1CollectedHeap::do_full_collection(bool, bool) 0000000000e8e170 0000000000003bea t ZDriverMajor::run_thread() 000000000103ec00 0000000000003b81 t JvmtiEnv::RetransformClasses(int, _jclass* const*) [clone .isra.0] 0000000000d18c50 0000000000003b4d t StubGenerator::generate_md5_implCompress(StubGenStubId) 0000000000a97160 0000000000003b20 t PhaseIdealLoop::auto_vectorize(IdealLoopTree*, VSharedData&) [clone .part.0] 00000000013df000 0000000000003aa8 d ruleName 00000000005c7e90 0000000000003a9f t PhiNode::Ideal(PhaseGVN*, bool) 00000000006a1ef0 0000000000003a9f t State::_sub_Op_AddP(Node const*) 0000000000e86880 0000000000003a74 t ZGeneration::select_relocation_set(ZGenerationId, bool) --More-- ------------- PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2758282021 From mbaesken at openjdk.org Thu Mar 27 15:31:16 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 27 Mar 2025 15:31:16 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: <6zNYNSD4GAfIqmDRBRj1a4_Q73C4EeJYv_tn2k0k2Fw=.71f884d2-b82e-4a9c-be4d-bacd49802cbf@github.com> References: <6zNYNSD4GAfIqmDRBRj1a4_Q73C4EeJYv_tn2k0k2Fw=.71f884d2-b82e-4a9c-be4d-bacd49802cbf@github.com> Message-ID: <9h-FX9Uhvn1fGWkhd0rfdoYbCW_Vl8k7c3Psm4fb1jI=.ce809be2-bef6-4cbb-8031-7845c268c2f2@github.com> On Thu, 27 Mar 2025 04:29:55 GMT, David Holmes wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> address Windows issues > > src/hotspot/share/runtime/arguments.cpp line 3665: > >> 3663: >> 3664: // cache the release file of the JDK image >> 3665: os::read_image_release_file(); > > What is the impact on startup? The loading takes about that time Elapsed time loading release file: 0.000127 seconds (measured on Linux x86_64, jdk image with release file is on a slow filer, probably even faster in a good environment) This is measured with clock // cache the release file of the JDK image + clock_t start, end; + double elapsed_time; + + start = clock(); + os::read_image_release_file(); + end = clock(); + elapsed_time = ((double)(end - start)) / CLOCKS_PER_SEC; + + if (log_is_enabled(Info, arguments)) { + printf("Elapsed time loading release file: %f seconds\n", elapsed_time); + } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2016940068 From jsikstro at openjdk.org Thu Mar 27 15:52:19 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 27 Mar 2025 15:52:19 GMT Subject: RFR: 8352762: Use EXACTFMT instead of expanded version where applicable In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 04:52:17 GMT, David Holmes wrote: >> [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) introduced the EXACTFMT macro, which is a shorthand for printing exact values using methods defined in globalDefinitions.hpp. There are currently 20 places in HotSpot which uses the expanded version of the macro, along with the "trace_page_size_params" macro that is defined and used in os.cpp. >> >> I have replaced places that use the expanded macro(s) with EXACTFMT + EXACTFMTARGS, and also removed trace_page_size_params from os.cpp, which was essentially a redefnition of EXACTFMTARGS. >> >> Testing: GHA, tiers 1-4 > > Paging @tstuefe ! Thomas added EXACTFMT in [JDK-8310233](https://github.com/openjdk/jdk/pull/14739/files#top) and did not use it for some of the places where you are now using it. Despite being a reviewer of Thomas's change, I'm not at all sure when EXACTFMT should be used. But this looks good. Thank you for the reviews! @dholmes-ora @tstuefe ------------- PR Comment: https://git.openjdk.org/jdk/pull/24228#issuecomment-2758536909 From jsikstro at openjdk.org Thu Mar 27 15:52:19 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 27 Mar 2025 15:52:19 GMT Subject: Integrated: 8352762: Use EXACTFMT instead of expanded version where applicable In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 13:59:14 GMT, Joel Sikstr?m wrote: > [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) introduced the EXACTFMT macro, which is a shorthand for printing exact values using methods defined in globalDefinitions.hpp. There are currently 20 places in HotSpot which uses the expanded version of the macro, along with the "trace_page_size_params" macro that is defined and used in os.cpp. > > I have replaced places that use the expanded macro(s) with EXACTFMT + EXACTFMTARGS, and also removed trace_page_size_params from os.cpp, which was essentially a redefnition of EXACTFMTARGS. > > Testing: GHA, tiers 1-4 This pull request has now been integrated. Changeset: dc5c4148 Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/dc5c4148c70ca43d0a69c326e14898adca2f0bae Stats: 70 lines in 8 files changed: 0 ins; 20 del; 50 mod 8352762: Use EXACTFMT instead of expanded version where applicable Reviewed-by: dholmes, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/24228 From shade at openjdk.org Thu Mar 27 17:08:44 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 17:08:44 GMT Subject: RFR: 8351155: C1/C2: Remove 32-bit x86 specific FP rounding support [v2] In-Reply-To: References: <8zUrV-sMSOwRSQk_jERtFqjrzOFUP7rlUwTTN7cPP_8=.b1d30fb5-d9c0-4417-bacd-bf09f2af433b@github.com> Message-ID: <02tkGNzza6MfOkCxeymt8tcXm3bSCPiv6GBCkwjcLs4=.4d351dd7-82b9-46c2-ada6-facf807f70a2@github.com> On Thu, 27 Mar 2025 08:45:49 GMT, Aleksey Shipilev wrote: >> C1 and C2 have support for rounding double/floats, to support awkward rounding modes of x87 FPU. With 32-bit x86 port removed, we can remove those parts. This basically deletes all the code that uses `strict_fp_requires_explicit_rounding`, which is now universally `false` for all supported platforms. >> >> For C1, we remove `RoundFP` op, its associated `lir_roundfp` and related utility methods that insert these nodes in the graph. >> >> For C2, we remove `RoundDouble` and `RoundFloat` nodes (note there is a confusingly named `RoundDoubleMode` nodes that are not related to this), associated utility methods, AD match rules that reference these nodes (as nops!), and some `Ideal`-s that are no longer needed. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Minor leftover Need a quick re-review after a minor leftover removal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24250#issuecomment-2758809458 From vlivanov at openjdk.org Thu Mar 27 17:59:25 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 27 Mar 2025 17:59:25 GMT Subject: RFR: 8351155: C1/C2: Remove 32-bit x86 specific FP rounding support [v2] In-Reply-To: References: <8zUrV-sMSOwRSQk_jERtFqjrzOFUP7rlUwTTN7cPP_8=.b1d30fb5-d9c0-4417-bacd-bf09f2af433b@github.com> Message-ID: On Thu, 27 Mar 2025 08:45:49 GMT, Aleksey Shipilev wrote: >> C1 and C2 have support for rounding double/floats, to support awkward rounding modes of x87 FPU. With 32-bit x86 port removed, we can remove those parts. This basically deletes all the code that uses `strict_fp_requires_explicit_rounding`, which is now universally `false` for all supported platforms. >> >> For C1, we remove `RoundFP` op, its associated `lir_roundfp` and related utility methods that insert these nodes in the graph. >> >> For C2, we remove `RoundDouble` and `RoundFloat` nodes (note there is a confusingly named `RoundDoubleMode` nodes that are not related to this), associated utility methods, AD match rules that reference these nodes (as nops!), and some `Ideal`-s that are no longer needed. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Minor leftover Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24250#pullrequestreview-2722925670 From mli at openjdk.org Thu Mar 27 18:01:38 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 27 Mar 2025 18:01:38 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 14:32:18 GMT, Robbin Ehn wrote: >> Hi, for you to consider. >> >> These tests constantly fails in qemu-user. >> Either the require host to be same arch or they are very very slow in emulation. >> E.g. "ptrace(PTRACE_ATTACH, ..) failed for 405157: Function not implemented'" for SA tests. >> This is the initial set of tests, there are many more, but I need to do some more verification for those. >> >> From bug: >>> qemu-user/rv64 sets uarch to "qemu" in /proc/cpuinfo (qemu-system do not do that). >>> We add this uarch to CPU feature string. >>> This means we can use jtreg 'require' with cpu string to filter out tests in qemu-user. >> >> Relevant qemu code: >> https://github.com/qemu/qemu/blob/170825d14d88a1ce7fae98d5a928480f2f329b22/linux-user/riscv/target_proc.h#L29 >> >> Relevant hotspot code: >> https://github.com/openjdk/jdk/blob/fa0b18bfde38ee2ffbab33a9eaac547fe8aa3c7c/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp#L250 >> >> Tested that the require only filters out tests in qemu+riscv64. >> >> Thanks! >> >> /Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into qemu-user-issues > - more > - more > - native or very long I also feel annoying to see some tests fail interminently. Not sure if I understand the goal of this pr, seems it might not be the best solution to simply disable these tests when running with qemu. My concerns are: qemu is still one of main methods to quickly verify the functionality changes, but when we just disable the failed tests, and maybe in the future disable more and more tests, then qemu is no longer able to play the role it was supposed to play. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24229#issuecomment-2758972731 From ascarpino at openjdk.org Thu Mar 27 18:11:27 2025 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Thu, 27 Mar 2025 18:11:27 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v7] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 17:23:51 GMT, Volodymyr Paprotski wrote: >> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) >> >> Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. >> >> Before (no AVX512) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s >> >> After (with AVX2) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s >> >> >> Before (with AVX512): >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 o... > > Volodymyr Paprotski has updated the pull request incrementally with two additional commits since the last revision: > > - whitespace > - prettify test Java code changes look good. This has passed tier1-3 testing. ------------- Marked as reviewed by ascarpino (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23719#pullrequestreview-2722978322 From shade at openjdk.org Thu Mar 27 18:14:34 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 18:14:34 GMT Subject: RFR: 8351155: C1/C2: Remove 32-bit x86 specific FP rounding support [v2] In-Reply-To: References: <8zUrV-sMSOwRSQk_jERtFqjrzOFUP7rlUwTTN7cPP_8=.b1d30fb5-d9c0-4417-bacd-bf09f2af433b@github.com> Message-ID: On Thu, 27 Mar 2025 08:45:49 GMT, Aleksey Shipilev wrote: >> C1 and C2 have support for rounding double/floats, to support awkward rounding modes of x87 FPU. With 32-bit x86 port removed, we can remove those parts. This basically deletes all the code that uses `strict_fp_requires_explicit_rounding`, which is now universally `false` for all supported platforms. >> >> For C1, we remove `RoundFP` op, its associated `lir_roundfp` and related utility methods that insert these nodes in the graph. >> >> For C2, we remove `RoundDouble` and `RoundFloat` nodes (note there is a confusingly named `RoundDoubleMode` nodes that are not related to this), associated utility methods, AD match rules that reference these nodes (as nops!), and some `Ideal`-s that are no longer needed. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Minor leftover Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24250#issuecomment-2759003292 From shade at openjdk.org Thu Mar 27 18:14:34 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 18:14:34 GMT Subject: Integrated: 8351155: C1/C2: Remove 32-bit x86 specific FP rounding support In-Reply-To: <8zUrV-sMSOwRSQk_jERtFqjrzOFUP7rlUwTTN7cPP_8=.b1d30fb5-d9c0-4417-bacd-bf09f2af433b@github.com> References: <8zUrV-sMSOwRSQk_jERtFqjrzOFUP7rlUwTTN7cPP_8=.b1d30fb5-d9c0-4417-bacd-bf09f2af433b@github.com> Message-ID: On Wed, 26 Mar 2025 10:11:25 GMT, Aleksey Shipilev wrote: > C1 and C2 have support for rounding double/floats, to support awkward rounding modes of x87 FPU. With 32-bit x86 port removed, we can remove those parts. This basically deletes all the code that uses `strict_fp_requires_explicit_rounding`, which is now universally `false` for all supported platforms. > > For C1, we remove `RoundFP` op, its associated `lir_roundfp` and related utility methods that insert these nodes in the graph. > > For C2, we remove `RoundDouble` and `RoundFloat` nodes (note there is a confusingly named `RoundDoubleMode` nodes that are not related to this), associated utility methods, AD match rules that reference these nodes (as nops!), and some `Ideal`-s that are no longer needed. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: b73663a2 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/b73663a2b4fe7049fc0990c1a1e51221640b4e29 Stats: 547 lines in 48 files changed: 0 ins; 513 del; 34 mod 8351155: C1/C2: Remove 32-bit x86 specific FP rounding support Reviewed-by: vlivanov, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24250 From ascarpino at openjdk.org Thu Mar 27 18:14:24 2025 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Thu, 27 Mar 2025 18:14:24 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v7] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 17:23:51 GMT, Volodymyr Paprotski wrote: >> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) >> >> Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. >> >> Before (no AVX512) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s >> >> After (with AVX2) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s >> >> >> Before (with AVX512): >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 o... > > Volodymyr Paprotski has updated the pull request incrementally with two additional commits since the last revision: > > - whitespace > - prettify test Wait on integration. I need to check something ------------- Changes requested by ascarpino (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23719#pullrequestreview-2722990209 From ascarpino at openjdk.org Thu Mar 27 18:55:43 2025 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Thu, 27 Mar 2025 18:55:43 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v7] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 17:23:51 GMT, Volodymyr Paprotski wrote: >> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) >> >> Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. >> >> Before (no AVX512) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s >> >> After (with AVX2) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s >> >> >> Before (with AVX512): >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 o... > > Volodymyr Paprotski has updated the pull request incrementally with two additional commits since the last revision: > > - whitespace > - prettify test src/java.base/share/classes/sun/security/util/math/intpoly/MontgomeryIntegerPolynomialP256.java line 2: > 1: /* > 2: * Copyright (c) 2024, 2025 Oracle and/or its affiliates. All rights reserved. You are missing a trailing comma. I didn't pick this up originally because a test I didn't expect to see failed on this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r2017443311 From vpaprotski at openjdk.org Thu Mar 27 19:13:59 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 27 Mar 2025 19:13:59 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8] In-Reply-To: References: Message-ID: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. > > Before (no AVX512) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s > > After (with AVX2) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s > > > Before (with AVX512): > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 102... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: Fix copyright stmt ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23719/files - new: https://git.openjdk.org/jdk/pull/23719/files/a7f756af..0a4230aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23719/head:pull/23719 PR: https://git.openjdk.org/jdk/pull/23719 From vpaprotski at openjdk.org Thu Mar 27 19:14:01 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 27 Mar 2025 19:14:01 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v7] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 18:52:53 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with two additional commits since the last revision: >> >> - whitespace >> - prettify test > > src/java.base/share/classes/sun/security/util/math/intpoly/MontgomeryIntegerPolynomialP256.java line 2: > >> 1: /* >> 2: * Copyright (c) 2024, 2025 Oracle and/or its affiliates. All rights reserved. > > You are missing a trailing comma. I didn't pick this up originally because a test I didn't expect to see failed on this. Fixed.. guess should had looked at other files when I had to add the second year/range! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r2017464572 From gziemski at openjdk.org Thu Mar 27 19:43:13 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 27 Mar 2025 19:43:13 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type Message-ID: This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. I tried to fill in tag, when I was pretty certain that I had the right type. At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. ------------- Commit messages: - work Changes: https://git.openjdk.org/jdk/pull/24282/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344883 Stats: 145 lines in 47 files changed: 19 ins; 0 del; 126 mod Patch: https://git.openjdk.org/jdk/pull/24282.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24282/head:pull/24282 PR: https://git.openjdk.org/jdk/pull/24282 From gziemski at openjdk.org Thu Mar 27 19:47:56 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 27 Mar 2025 19:47:56 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v2] In-Reply-To: References: Message-ID: > This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. > > I tried to fill in tag, when I was pretty certain that I had the right type. > > At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: work ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24282/files - new: https://git.openjdk.org/jdk/pull/24282/files/a749ee60..582b1860 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24282.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24282/head:pull/24282 PR: https://git.openjdk.org/jdk/pull/24282 From duke at openjdk.org Thu Mar 27 20:01:27 2025 From: duke at openjdk.org (duke) Date: Thu, 27 Mar 2025 20:01:27 GMT Subject: Withdrawn: 8348855: G1: Implement G1BarrierSetC2::estimate_stub_size In-Reply-To: References: Message-ID: On Tue, 28 Jan 2025 14:01:21 GMT, Aleksey Shipilev wrote: > We run into peculiar problem in Leyden: due to current prototype limitation, we cannot yet store the generated code that has the expanded code buffer. The code buffer expansion routinely happens with late G1 barrier expansion, as `G1BarrierSetC2` does not report any estimate for stub sizes. > > So a method rich in these G1 stubs would blow the initial code size estimate, force the buffer resize, and thus disqualify itself from storing C2 code in Leyden. Whoops. Fortunately, we just need to implement `G1BarrierSetC2::estimate_stub_size()` in mainline to avoid a significant part of this problem. > > I also fixed the misattribution of "stub" sizes in insn section, this part is also important to get the stub sizes right. I can do that separately, but it would only matter for ZGC without G1 stub size estimation implemented. > > You can see the impact it has on Leyden here: > https://github.com/openjdk/leyden/pull/28#issuecomment-2619077625 > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23333 From iklam at openjdk.org Thu Mar 27 20:39:15 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 27 Mar 2025 20:39:15 GMT Subject: RFR: 8352437: Support --add-exports with -XX:+AOTClassLinking [v2] In-Reply-To: References: Message-ID: > `-XX:+AOTClassLinking` requires the CDS archived full module graph (FMG). > > - Before this PR, when `--add-export` is specified, FMG is disabled, so AOT caches created with `-XX:+AOTClassLinking` cannot be loaded. > - After this PR, if the exact same `--add-export` flags as specified across the training/assembly/production phases, the FMG can be used, so we can use so AOT caches created with `-XX:+AOTClassLinking`. > > The change itself is straight-forward: just remember the `--add-export` flags specified during AOT cache creation, and check the exact same ones are used during the production run. > > I did a fair amount of refactoring to change the "exact options specified" checks in modules.cpp, so more such options can be easily added in the future (we need to handle `--add-reads` and `--add-opens` in future RFEs). > > (Note: this PR depends on #24122 ) Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24124/files - new: https://git.openjdk.org/jdk/pull/24124/files/164cd7f5..164cd7f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24124&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24124&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24124.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24124/head:pull/24124 PR: https://git.openjdk.org/jdk/pull/24124 From iklam at openjdk.org Thu Mar 27 22:13:02 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 27 Mar 2025 22:13:02 GMT Subject: RFR: 8352437: Support --add-exports with -XX:+AOTClassLinking [v3] In-Reply-To: References: Message-ID: > `-XX:+AOTClassLinking` requires the CDS archived full module graph (FMG). > > - Before this PR, when `--add-export` is specified, FMG is disabled, so AOT caches created with `-XX:+AOTClassLinking` cannot be loaded. > - After this PR, if the exact same `--add-export` flags as specified across the training/assembly/production phases, the FMG can be used, so we can use so AOT caches created with `-XX:+AOTClassLinking`. > > The change itself is straight-forward: just remember the `--add-export` flags specified during AOT cache creation, and check the exact same ones are used during the production run. > > I did a fair amount of refactoring to change the "exact options specified" checks in modules.cpp, so more such options can be easily added in the future (we need to handle `--add-reads` and `--add-opens` in future RFEs). > > (Note: this PR depends on #24122 ) Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - @calvinccheung comments - Merge branch 'master' into 8352437-aot-class-linking-incompatible-with-add-exports - Fixed whitespaces - clean up - 8352437: -XX:+AOTClassLinking is not compatible with --add-export - added comments - added comments - Prototype: support --add-exports in CDS FMG ------------- Changes: https://git.openjdk.org/jdk/pull/24124/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24124&range=02 Stats: 580 lines in 15 files changed: 463 ins; 64 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/24124.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24124/head:pull/24124 PR: https://git.openjdk.org/jdk/pull/24124 From iklam at openjdk.org Thu Mar 27 22:13:04 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 27 Mar 2025 22:13:04 GMT Subject: RFR: 8352437: Support --add-exports with -XX:+AOTClassLinking [v2] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 04:13:30 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Fixed whitespaces >> - clean up >> - 8352437: -XX:+AOTClassLinking is not compatible with --add-export >> - added comments >> - added comments >> - Prototype: support --add-exports in CDS FMG > > test/hotspot/jtreg/runtime/cds/appcds/jigsaw/ExactOptionMatch.java line 91: > >> 89: >> 90: // (4) Dump = specified twice, Run = specified twice (but in different order) >> 91: // Should still be able to use FMG (values are sorted by CDS). > > How about add another test case where the values are specified in the same order? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24124#discussion_r2017659553 From iklam at openjdk.org Thu Mar 27 22:13:06 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 27 Mar 2025 22:13:06 GMT Subject: RFR: 8352437: Support --add-exports with -XX:+AOTClassLinking [v3] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 04:12:23 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - @calvinccheung comments >> - Merge branch 'master' into 8352437-aot-class-linking-incompatible-with-add-exports >> - Fixed whitespaces >> - clean up >> - 8352437: -XX:+AOTClassLinking is not compatible with --add-export >> - added comments >> - added comments >> - Prototype: support --add-exports in CDS FMG > > test/lib/jdk/test/lib/cds/CDSModulePackager.java line 36: > >> 34: import jdk.test.lib.cds.CDSJarUtils.JarOptions; >> 35: >> 36: /* > > This file has no change other than the above blank line deletion. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24124#discussion_r2017659524 From vpaprotski at openjdk.org Thu Mar 27 22:40:28 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 27 Mar 2025 22:40:28 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v7] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 18:11:32 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with two additional commits since the last revision: >> >> - whitespace >> - prettify test > > Wait on integration. I need to check something Let me know if I can 'invoke' integrate @ascarpino ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23719#issuecomment-2759693944 From ascarpino at openjdk.org Thu Mar 27 23:15:10 2025 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Thu, 27 Mar 2025 23:15:10 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8] In-Reply-To: References: Message-ID: <6dwvEmNBtqHw5Nn6SIRlNMkL_jgp9vKZo9H7DBxFgIQ=.62b6e849-3688-4cb2-8978-2c00327bd1a8@github.com> On Thu, 27 Mar 2025 19:13:59 GMT, Volodymyr Paprotski wrote: >> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) >> >> Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. >> >> Before (no AVX512) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s >> >> After (with AVX2) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s >> >> >> Before (with AVX512): >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 o... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright stmt Thanks. The test now passes. ------------- Marked as reviewed by ascarpino (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23719#pullrequestreview-2723890584 From jwaters at openjdk.org Fri Mar 28 05:13:16 2025 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 28 Mar 2025 05:13:16 GMT Subject: RFR: 8345265: Minor improvements for LTO across all compilers [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 14:32:35 GMT, Matthias Baesken wrote: > > Wait, sorry to trouble you further, but what does nm --demangle --reverse-sort --print-size --size-sort libjvm.so on HotSpot compiled by gcc 14 with LTO active yield as the largest symbol in the binary? (It should be the symbol listed at the very top) > > This is my output; maybe I have to add that I used the 'normal' jdk head without patches, is that what I should do for a gcc14 build test? > > ``` > nm --demangle --reverse-sort --print-size --size-sort images/jdk/lib/server/libjvm.so | more > 0000000000453ee0 000000000002320e t State::MachNodeGenerator(int) > 0000000000970f70 0000000000018eb9 t CompilerToVM::initialize_intrinsics(JVMCIEnv*) > 000000000140caa0 000000000000f018 b Matcher::mreg2regmask > 0000000000993c80 000000000000a40d t JNIJVMCI::initialize_ids(JNIEnv_*) > 0000000000ac16b0 0000000000009d16 t Matcher::Fixup_Save_On_Entry() > 000000000143db00 0000000000008000 b _ZL9_elements.lto_priv.0 > 0000000001446d20 0000000000008000 b _free_list > 000000000141e900 0000000000007d00 b DFSClosure::_reference_stack > 00000000013d6d40 0000000000007668 d _ZL9flagTable.lto_priv.0 > 00000000013edf60 0000000000006c30 d VMStructs::localHotSpotVMStructs > 00000000010463f0 0000000000006a06 t readConfiguration0(JNIEnv_*, JVMCIEnv*) [clone .isra.0] > 0000000000d51dc0 00000000000067a2 t StubGenerator::generate_libmPow() > 00000000010b12d0 0000000000006289 t G1ParScanThreadState::trim_queue_to_threshold(unsigned int) > 0000000000e24550 00000000000061e8 t ClassVerifier::verify_method(methodHandle const&, JavaThread*) > 0000000001076dd0 000000000000548d t State::DFA(int, Node const*) [clone .isra.0] > 0000000000e1ba00 000000000000519d t VMError::report(outputStream*, bool) > 00000000014521e0 0000000000005000 b TemplateInterpreter::_safept_table > 000000000142cb60 0000000000005000 b TemplateInterpreter::_normal_table > 0000000001431b60 0000000000005000 b TemplateInterpreter::_active_table > 0000000000653790 0000000000004e12 t CompileBroker::print_heapinfo(outputStream*, char const*, unsigned long) > 000000000075be40 0000000000004e0b t G1CollectedHeap::do_collection_pause_at_safepoint_helper() > 0000000000b90260 0000000000004a41 t Parse::do_one_bytecode() [clone .part.0] > 00000000010c1f20 00000000000049e6 t d_print_comp_inner > 0000000000c1e180 0000000000004594 t ServiceThread::service_thread_entry(JavaThread*, JavaThread*) > 00000000005ad7c0 0000000000004424 t C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) > 0000000000d260b0 000000000000440a t PhaseStringOpts::replace_string_concat(StringConcat*) > 0000000000e48920 00000000000042e5 t VM_Version::initialize() > 0000000000afbad0 0000000000004240 t Method::init_intrinsic_id(vmSymbolID) > 00000000010200d0 00000000000040ab t PSParallelCompact::invoke_no_policy(bool) [clone .isra.0] > 000000000051f230 0000000000004054 t Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) > 000000000065c060 0000000000004015 t Compile::Code_Gen() > 0000000000663060 0000000000003fec t CompileBroker::compiler_thread_loop() > 000000000070d9b0 0000000000003fd7 t ConnectionGraph::do_analysis(Compile*, PhaseIterGVN*) > 0000000000be8890 0000000000003f8e t PhaseChaitin::Split(unsigned int, ResourceArea*) > 0000000000818730 0000000000003f82 t PhaseChaitin::build_ifg_physical(ResourceArea*) > 0000000000fb1920 0000000000003f44 t SharedRuntime::generate_native_wrapper(MacroAssembler*, methodHandle const&, int, BasicType*, VMRegPair*, BasicType) [clone .constprop.0] > 00000000007e2ab0 0000000000003f41 t PhaseCFG::global_code_motion() > 00000000013e8580 0000000000003e10 d JVMCIVMStructs::localHotSpotVMStructs > 0000000000db3b00 0000000000003dff t TemplateInterpreterGenerator::generate_all() > 0000000000f436d0 0000000000003d92 t initialize_stubs(StubGenBlobId, int, int, char const*, char const*, char const*) [clone .constprop.0] > 0000000000d6b680 0000000000003d42 t StubGenerator::generate_libmTan() > 00000000004ff560 0000000000003d28 t BCEscapeAnalyzer::iterate_blocks(Arena*) > 0000000000a2bfb0 0000000000003cef t VM_RedefineClasses::load_new_class_versions() [clone .part.0] > 000000000073fde0 0000000000003ca2 t G1CollectedHeap::do_full_collection(bool, bool) > 0000000000e8e170 0000000000003bea t ZDriverMajor::run_thread() > 000000000103ec00 0000000000003b81 t JvmtiEnv::RetransformClasses(int, _jclass* const*) [clone .isra.0] > 0000000000d18c50 0000000000003b4d t StubGenerator::generate_md5_implCompress(StubGenStubId) > 0000000000a97160 0000000000003b20 t PhaseIdealLoop::auto_vectorize(IdealLoopTree*, VSharedData&) [clone .part.0] > 00000000013df000 0000000000003aa8 d ruleName > 00000000005c7e90 0000000000003a9f t PhiNode::Ideal(PhaseGVN*, bool) > 00000000006a1ef0 0000000000003a9f t State::_sub_Op_AddP(Node const*) > 0000000000e86880 0000000000003a74 t ZGeneration::select_relocation_set(ZGenerationId, bool) > --More-- > ``` Yes, that should be good enough, thank you for sharing it. I'm baffled by how tiny the methods are on Linux, in particular G1ParScanThreadState::trim_queue_to_threshold(unsigned int) only being 25KB is astonishing to me. I have no clue why flatten causes so much inlining on Windows to the point where it results in massive 5MB G1 methods, but then it's perfectly fine on Windows. I really wonder why the things I have to solve can never be easy sometimes ------------- PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2760210133 From rehn at openjdk.org Fri Mar 28 06:56:21 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 28 Mar 2025 06:56:21 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user [v2] In-Reply-To: References: Message-ID: <1pa1FDH5Z2quR3fE7o4qfZKwRrz8nXHbMSirSyiqhTw=.9c37d2a9-5b93-40dd-8b5a-a5822030ef48@github.com> On Thu, 27 Mar 2025 17:57:37 GMT, Hamlin Li wrote: > I also feel annoying to see some tests fail interminently. > > Not sure if I understand the goal of this pr, seems it might not be the best solution to simply disable these tests when running with qemu. My concerns are: qemu is still one of main methods to quickly verify the functionality changes, but when we just disable the failed tests, and maybe in the future disable more and more tests, then qemu is no longer able to play the role it was supposed to play. It's not some intermittently failure. The majority of them can't work as they use pstack, open core files, use PerfData, etc.. and expected it to be rv64. But core files, pstack are in host arch as we are running qemu-user. I can remove tests which timeouts and only keep test which simply can't work in qemu-user environment in this PR. Seems good? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24229#issuecomment-2760381837 From stefank at openjdk.org Fri Mar 28 08:28:23 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 28 Mar 2025 08:28:23 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 19:47:56 GMT, Gerard Ziemski wrote: >> This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. >> >> I tried to fill in tag, when I was pretty certain that I had the right type. >> >> At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > work I went over the patch and added suggestions for places where I think you're using the wrong tag, or where I think it is obvious that there's a better tag than mtNone. I've also suggested removal of the now redundant 'executable' argument, which I want to see as little of as possible given that it is a wart on the memory reservation APIs (IMHO). src/hotspot/os/bsd/gc/z/zPhysicalMemoryBacking_bsd.cpp line 81: > 79: > 80: // Reserve address space for backing memory > 81: _base = (uintptr_t)os::reserve_memory(max_capacity, mtJavaHeap, false); Suggestion: _base = (uintptr_t)os::reserve_memory(max_capacity, mtJavaHeap); src/hotspot/os/windows/os_windows.cpp line 3261: > 3259: assert(extra_size >= size, "overflow, size is too large to allow alignment"); > 3260: > 3261: Suggestion: src/hotspot/os/windows/os_windows.cpp line 3267: > 3265: for (int attempt = 0; attempt < max_attempts && aligned_base == nullptr; attempt ++) { > 3266: char* extra_base = file_desc != -1 ? os::map_memory_to_file(extra_size, file_desc, mem_tag) : > 3267: os::reserve_memory(extra_size, mem_tag, false); Suggestion: os::reserve_memory(extra_size, mem_tag); src/hotspot/os/windows/os_windows.cpp line 3284: > 3282: // Which may fail, hence the loop. > 3283: aligned_base = file_desc != -1 ? os::attempt_map_memory_to_file_at(aligned_base, size, file_desc, mem_tag) : > 3284: os::attempt_reserve_memory_at(aligned_base, size, mem_tag, false); Suggestion: os::attempt_reserve_memory_at(aligned_base, size, mem_tag); src/hotspot/os/windows/perfMemory_windows.cpp line 57: > 55: > 56: // allocate an aligned chuck of memory > 57: char* mapAddress = os::reserve_memory(size, mtNone); To match with the other platforms: Suggestion: char* mapAddress = os::reserve_memory(size, mtInternal); src/hotspot/share/classfile/compactHashtable.cpp line 229: > 227: quit("Unable to open hashtable dump file", filename); > 228: } > 229: _base = os::map_memory(_fd, filename, 0, nullptr, _size, mtSymbol, true, false); This seems to be used to read Symbols *OR* String. This probably needs to be something else. I suggest to revert to mtNone and figure out the appropriate tag later. Suggestion: _base = os::map_memory(_fd, filename, 0, nullptr, _size, mtNone, true, false); src/hotspot/share/gc/parallel/parMarkBitMap.cpp line 52: > 50: rs_align, > 51: page_sz, > 52: mtNone); Suggestion: mtGC); src/hotspot/share/gc/shenandoah/shenandoahCardTable.cpp line 62: > 60: _write_byte_map_base = _byte_map_base; > 61: > 62: ReservedSpace read_space = MemoryReserver::reserve(_byte_map_size, rs_align, _page_size, mtNone); Suggestion: ReservedSpace read_space = MemoryReserver::reserve(_byte_map_size, rs_align, _page_size, mtGC); src/hotspot/share/memory/allocation.inline.hpp line 61: > 59: size_t size = size_for(length); > 60: > 61: char* addr = os::reserve_memory(size, mem_tag, !ExecMem); Suggestion: char* addr = os::reserve_memory(size, mem_tag); src/hotspot/share/memory/allocation.inline.hpp line 78: > 76: size_t size = size_for(length); > 77: > 78: char* addr = os::reserve_memory(size, mem_tag, !ExecMem); Suggestion: char* addr = os::reserve_memory(size, mem_tag); src/hotspot/share/memory/metaspace/testHelpers.cpp line 85: > 83: if (reserve_limit > 0) { > 84: // have reserve limit -> non-expandable context > 85: _rs = MemoryReserver::reserve(reserve_limit * BytesPerWord, Metaspace::reserve_alignment(), os::vm_page_size(), mtNone); Suggestion: _rs = MemoryReserver::reserve(reserve_limit * BytesPerWord, Metaspace::reserve_alignment(), os::vm_page_size(), mtTest); src/hotspot/share/memory/metaspace/virtualSpaceNode.cpp line 259: > 257: ReservedSpace rs = MemoryReserver::reserve(word_size * BytesPerWord, > 258: Settings::virtual_space_node_reserve_alignment_words() * BytesPerWord, > 259: os::vm_page_size(), mtNone); Suggestion: os::vm_page_size(), mtMetaspace); src/hotspot/share/prims/jni.cpp line 2403: > 2401: if (bad_address == nullptr) { > 2402: size_t size = os::vm_allocation_granularity(); > 2403: bad_address = os::reserve_memory(size, mtInternal, false); Suggestion: bad_address = os::reserve_memory(size, mtInternal); src/hotspot/share/prims/whitebox.cpp line 720: > 718: > 719: WB_ENTRY(jlong, WB_NMTReserveMemory(JNIEnv* env, jobject o, jlong size)) > 720: return (jlong)(uintptr_t)os::reserve_memory(size, mtTest, false); Suggestion: return (jlong)(uintptr_t)os::reserve_memory(size, mtTest); src/hotspot/share/prims/whitebox.cpp line 724: > 722: > 723: WB_ENTRY(jlong, WB_NMTAttemptReserveMemoryAt(JNIEnv* env, jobject o, jlong addr, jlong size)) > 724: return (jlong)(uintptr_t)os::attempt_reserve_memory_at((char*)(uintptr_t)addr, (size_t)size, mtTest, false); Suggestion: return (jlong)(uintptr_t)os::attempt_reserve_memory_at((char*)(uintptr_t)addr, (size_t)size, mtTest); src/hotspot/share/prims/whitebox.cpp line 1515: > 1513: static volatile char* p; > 1514: > 1515: p = os::reserve_memory(os::vm_allocation_granularity(), mtSymbol); Suggestion: p = os::reserve_memory(os::vm_allocation_granularity(), mtTest); src/hotspot/share/runtime/os.cpp line 2130: > 2128: log_trace(os, map)(ERRFMT, ERRFMTARGS); > 2129: log_debug(os, map)("successfully attached at " PTR_FORMAT, p2i(result)); > 2130: MemTracker::record_virtual_memory_reserve((address)result, bytes, CALLER_PC, mtNone); I think attempt_reserve_memory_between should provide the correct mem tag. src/hotspot/share/runtime/os.cpp line 2336: > 2334: if (result != nullptr) { > 2335: // The memory is committed > 2336: MemTracker::record_virtual_memory_reserve_and_commit((address)result, size, CALLER_PC, mtNone); reserve_memory_special should take a mem tag, but I guess you intend to do that as a follow-up RFE? src/hotspot/share/runtime/safepointMechanism.cpp line 60: > 58: const size_t page_size = os::vm_page_size(); > 59: const size_t allocation_size = 2 * page_size; > 60: char* polling_page = os::reserve_memory(allocation_size, mtSafepoint, !ExecMem); Suggestion: char* polling_page = os::reserve_memory(allocation_size, mtSafepoint); src/hotspot/share/utilities/debug.cpp line 715: > 713: #ifdef CAN_SHOW_REGISTERS_ON_ASSERT > 714: void initialize_assert_poison() { > 715: char* page = os::reserve_memory(os::vm_page_size(), mtInternal, !ExecMem); Suggestion: char* page = os::reserve_memory(os::vm_page_size(), mtInternal); test/hotspot/gtest/gc/g1/test_stressCommitUncommit.cpp line 86: > 84: os::vm_allocation_granularity(), > 85: os::vm_page_size(), > 86: mtNone); Suggestion: mtTest); test/hotspot/gtest/gc/g1/test_stressCommitUncommit.cpp line 113: > 111: os::vm_allocation_granularity(), > 112: os::vm_page_size(), > 113: mtNone); Suggestion: mtTest); test/hotspot/gtest/memory/test_virtualspace.cpp line 62: > 60: ASSERT_PRED2(is_size_aligned, size, os::vm_allocation_granularity()); > 61: > 62: ReservedSpace rs = MemoryReserver::reserve(size, mtNone); Why did you do this change? Suggestion: ReservedSpace rs = MemoryReserver::reserve(size, mtTest); test/hotspot/gtest/memory/test_virtualspace.cpp line 76: > 74: ASSERT_PRED2(is_size_aligned, size, alignment) << "Incorrect input parameters"; > 75: size_t page_size = UseLargePages ? os::large_page_size() : os::vm_page_size(); > 76: ReservedSpace rs = MemoryReserver::reserve(size, alignment, page_size, mtNone); Suggestion: ReservedSpace rs = MemoryReserver::reserve(size, alignment, page_size, mtTest); test/hotspot/gtest/memory/test_virtualspace.cpp line 104: > 102: size_t page_size = large ? os::large_page_size() : os::vm_page_size(); > 103: > 104: ReservedSpace rs = MemoryReserver::reserve(size, alignment, page_size, mtNone); Suggestion: ReservedSpace rs = MemoryReserver::reserve(size, alignment, page_size, mtTest); test/hotspot/gtest/memory/test_virtualspace.cpp line 215: > 213: case Default: > 214: case Reserve: > 215: return MemoryReserver::reserve(reserve_size_aligned, mtNone); Suggestion: return MemoryReserver::reserve(reserve_size_aligned, mtTest); test/hotspot/gtest/memory/test_virtualspace.cpp line 221: > 219: os::vm_allocation_granularity(), > 220: os::vm_page_size(), > 221: mtNone); Suggestion: mtTest); test/hotspot/gtest/memory/test_virtualspace.cpp line 300: > 298: size_t large_page_size = os::large_page_size(); > 299: > 300: ReservedSpace reserved = MemoryReserver::reserve(large_page_size, large_page_size, large_page_size, mtNone); Suggestion: ReservedSpace reserved = MemoryReserver::reserve(large_page_size, large_page_size, large_page_size, mtTest); test/hotspot/gtest/memory/test_virtualspace.cpp line 370: > 368: alignment, > 369: page_size, > 370: mtNone); Suggestion: mtTest); test/hotspot/gtest/memory/test_virtualspace.cpp line 388: > 386: ASSERT_TRUE(is_aligned(size, os::vm_allocation_granularity())) << "Must be at least AG aligned"; > 387: > 388: ReservedSpace rs = MemoryReserver::reserve(size, mtNone); Suggestion: ReservedSpace rs = MemoryReserver::reserve(size, mtTest); test/hotspot/gtest/memory/test_virtualspace.cpp line 416: > 414: alignment, > 415: page_size, > 416: mtNone); Suggestion: mtTest); test/hotspot/gtest/memory/test_virtualspace.cpp line 521: > 519: case Reserve: > 520: return MemoryReserver::reserve(reserve_size_aligned, > 521: mtNone); Suggestion: mtTest); test/hotspot/gtest/memory/test_virtualspace.cpp line 527: > 525: os::vm_allocation_granularity(), > 526: os::vm_page_size(), > 527: mtNone); Suggestion: mtTest); test/hotspot/gtest/memory/test_virtualspace.cpp line 585: > 583: large_page_size, > 584: large_page_size, > 585: mtNone); Suggestion: mtTest); test/hotspot/gtest/nmt/test_nmt_locationprinting.cpp line 116: > 114: > 115: static void test_for_mmap(size_t sz, ssize_t offset) { > 116: char* addr = os::reserve_memory(sz, mtTest, false); Suggestion: char* addr = os::reserve_memory(sz, mtTest); test/hotspot/gtest/runtime/test_committed_virtualmemory.cpp line 94: > 92: const size_t page_sz = os::vm_page_size(); > 93: const size_t size = num_pages * page_sz; > 94: char* base = os::reserve_memory(size, mtThreadStack, !ExecMem); Suggestion: char* base = os::reserve_memory(size, mtThreadStack); test/hotspot/gtest/runtime/test_committed_virtualmemory.cpp line 162: > 160: const size_t num_pages = 4; > 161: const size_t size = num_pages * page_sz; > 162: char* base = os::reserve_memory(size, mtTest, !ExecMem); Suggestion: char* base = os::reserve_memory(size, mtTest); test/hotspot/gtest/runtime/test_committed_virtualmemory.cpp line 208: > 206: const size_t size = num_pages * page_sz; > 207: > 208: char* base = os::reserve_memory(size, mtTest, !ExecMem); Suggestion: char* base = os::reserve_memory(size, mtTest); test/hotspot/gtest/runtime/test_os.cpp line 261: > 259: // two pages, first one protected. > 260: const size_t ps = os::vm_page_size(); > 261: char* two_pages = os::reserve_memory(ps * 2, mtTest, false); Suggestion: char* two_pages = os::reserve_memory(ps * 2, mtTest); test/hotspot/gtest/runtime/test_os.cpp line 533: > 531: size_t total_range_len = num_stripes * stripe_len; > 532: // Reserve a large contiguous area to get the address space... > 533: p = (address)os::reserve_memory(total_range_len, mtNone); Suggestion: p = (address)os::reserve_memory(total_range_len, mtTest); test/hotspot/gtest/runtime/test_os.cpp line 547: > 545: const bool executable = stripe % 2 == 0; > 546: #endif > 547: q = (address)os::attempt_reserve_memory_at((char*)q, stripe_len, mtNone, executable); Suggestion: q = (address)os::attempt_reserve_memory_at((char*)q, stripe_len, mtTest, executable); test/hotspot/gtest/runtime/test_os.cpp line 567: > 565: assert(is_aligned(stripe_len, os::vm_allocation_granularity()), "Sanity"); > 566: size_t total_range_len = num_stripes * stripe_len; > 567: address p = (address)os::reserve_memory(total_range_len, mtNone); Suggestion: address p = (address)os::reserve_memory(total_range_len, mtTest); test/hotspot/gtest/runtime/test_os.cpp line 634: > 632: > 633: // ...re-reserve the middle stripes. This should work unless release silently failed. > 634: address p2 = (address)os::attempt_reserve_memory_at((char*)p_middle_stripes, middle_stripe_len, mtNone); Suggestion: address p2 = (address)os::attempt_reserve_memory_at((char*)p_middle_stripes, middle_stripe_len, mtTest); test/hotspot/gtest/runtime/test_os.cpp line 657: > 655: TEST_VM(os, release_bad_ranges) { > 656: #endif > 657: char* p = os::reserve_memory(4 * M, mtNone); Suggestion: char* p = os::reserve_memory(4 * M, mtTest); test/hotspot/gtest/runtime/test_os.cpp line 692: > 690: // // make things even more difficult by trying to reserve at the border of the region > 691: address border = p + num_stripes * stripe_len; > 692: address p2 = (address)os::attempt_reserve_memory_at((char*)border, stripe_len, mtNone); Suggestion: address p2 = (address)os::attempt_reserve_memory_at((char*)border, stripe_len, mtTest); test/hotspot/gtest/runtime/test_os.cpp line 733: > 731: // Reserve a small range and fill it with a marker string, should show up > 732: // on implementations displaying range snippets > 733: char* p = os::reserve_memory(1 * M, mtInternal, false); Suggestion: char* p = os::reserve_memory(1 * M, mtInternal); test/hotspot/gtest/runtime/test_os.cpp line 757: > 755: // A simple allocation > 756: { > 757: address p = (address)os::reserve_memory(total_range_len, mtNone); Suggestion: address p = (address)os::reserve_memory(total_range_len, mtTest); test/hotspot/gtest/runtime/test_os.cpp line 1062: > 1060: > 1061: TEST_VM(os, reserve_at_wish_address_shall_not_replace_mappings_smallpages) { > 1062: char* p1 = os::reserve_memory(M, mtTest, false); Suggestion: char* p1 = os::reserve_memory(M, mtTest); test/hotspot/gtest/runtime/test_os.cpp line 1072: > 1070: if (UseLargePages && !os::can_commit_large_page_memory()) { // aka special > 1071: const size_t lpsz = os::large_page_size(); > 1072: char* p1 = os::reserve_memory_aligned(lpsz, lpsz, mtTest, false); Suggestion: char* p1 = os::reserve_memory_aligned(lpsz, lpsz, mtTest); test/hotspot/gtest/runtime/test_os.cpp line 1098: > 1096: const size_t size = pages * page_sz; > 1097: > 1098: char* base = os::reserve_memory(size, mtTest, false); Suggestion: char* base = os::reserve_memory(size, mtTest); test/hotspot/gtest/runtime/test_os_linux.cpp line 357: > 355: const bool useThp = UseTransparentHugePages; > 356: UseTransparentHugePages = true; > 357: char* const heap = os::reserve_memory(size, mtInternal, false); Suggestion: char* const heap = os::reserve_memory(size, mtInternal); ------------- Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24282#pullrequestreview-2724588390 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018111000 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018112894 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018113093 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018113247 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018114051 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018121233 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018121891 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018122312 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018122564 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018122715 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018123591 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018124074 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018125155 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018125568 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018125876 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018126443 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018128479 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018129358 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018098166 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018098617 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018098956 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018099157 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018100108 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018100521 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018100717 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018100870 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018101049 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018101200 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018101373 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018101541 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018101692 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018101911 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018102064 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018102298 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018102633 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018103596 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018103912 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018104093 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018104376 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018104743 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018105030 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018105216 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018105410 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018105552 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018105686 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018105792 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018105928 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018106123 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018106324 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018106474 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2018107303 From duke at openjdk.org Fri Mar 28 09:09:59 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 28 Mar 2025 09:09:59 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: > # Issue Summary > > When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. > > # Change Summary > > Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. > > Concretel, this PR > - adds parse predicate nodes to the IR testing framework, > - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, > - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, > - adds a regression test. > > > # Testing > > The changes passed the following testing: > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) > - tier1 through tier3 and Oracle internal testing Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: - idealKit::loop: always call add_parse_predicates It was contstrained on UseParsePredicate, but this is incorrect, since all parse predicates are added in that function. - Improve description of UseLoopPredicate argument ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24248/files - new: https://git.openjdk.org/jdk/pull/24248/files/72ebfc8e..1561a0ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=04-05 Stats: 9 lines in 2 files changed: 0 ins; 2 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24248/head:pull/24248 PR: https://git.openjdk.org/jdk/pull/24248 From duke at openjdk.org Fri Mar 28 09:10:00 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 28 Mar 2025 09:10:00 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v5] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 14:04:15 GMT, Emanuel Peter wrote: > You could also clean up the `IdealKit::loop`, which checks `UseLoopPredicate`only to call add_parse_predicates, which adds all predicates... and so it constrains too many things now. Cleaned up in [1561a0e](https://github.com/openjdk/jdk/pull/24248/commits/1561a0eea3b2049e4e9e6468d0237f60e97cd2e8). I also reran testing and everything passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24248#issuecomment-2760619412 From duke at openjdk.org Fri Mar 28 09:10:01 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 28 Mar 2025 09:10:01 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v5] In-Reply-To: <01JvdutO8geXQM1nMA6lw-SeC-bNIiApSPykfxnDZls=.3973b39b-7a36-44fa-8c13-91c02268c986@github.com> References: <01JvdutO8geXQM1nMA6lw-SeC-bNIiApSPykfxnDZls=.3973b39b-7a36-44fa-8c13-91c02268c986@github.com> Message-ID: On Thu, 27 Mar 2025 14:28:00 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/c2_globals.hpp line 790: >> >>> 788: "Move checks with an uncommon trap out of loops based on " \ >>> 789: "profiling data. " \ >>> 790: "Requires UseLoopPredicate to be turned on (default).") \ >> >> Can you also update the comment for `UseLoopPredicate`? It seems outdated / wrong. >> >> Now is: >> `Generate a predicate to select fast/slow loop versions` >> >> @chhagedorn do you have a good suggestion for what to put now? > > Good catch! It could be similar but without mentioning profiling data: > > Move checks with an uncommon trap out of loops. Fixed in [ca10148](https://github.com/openjdk/jdk/pull/24248/commits/ca101483aac17b0ace223df0f8a62bfd0dfa2e1f) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2018190355 From stuefe at openjdk.org Fri Mar 28 09:48:27 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 28 Mar 2025 09:48:27 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: <3zkHWLVEELkQkeSU9M0YAOpb3olMDNyU1HAdWUJEm68=.a2d2f9ea-c635-4379-95d7-00ff358eb15f@github.com> <5wBQqxybptneJjhR5usfrqg3PJ7G2PB_sDjUkb4BObM=.fe04a403-64ad-4dc5-b793-b48da01acfd4@github.com> Message-ID: On Thu, 27 Mar 2025 13:51:51 GMT, Robert Toyonaga wrote: > > > > > > I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today. > > > > > > > > > > > > > > > I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions. > > > > > > > > > > > > I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error? > > > > > > > > > [`VirtualSpace::shrink_by`](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/memory/virtualspace.cpp#L373) allows uncommit to fail without crashing. I'm not certain of the intention behind that. But it seems like it's because shrinking is an optimization and not always critical that it be done immediately. [[1](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/gc/serial/tenuredGeneration.cpp#L258)] > > > > > > The above example shows code that assumes that it is OK to fail uncommitting and continuing. I'm trying to figure it that assumption is true. So, what I meant was that I was looking for a concrete example of a failure mode of uncommit that would be an acceptable (safe) failure to continue executing from. That is, a valid failure that don't mess up the memory in an unpredictable/unknowable way. > > So release/uncommit (via mmap,munmap, VirtualFree) could fail due to: ? Bad arguments, or ? The OS encountered an issue out of control of the JVM. > > ? JVM bug. Reasonable to fatally fail here. Or the caller could be intentionally passing arguments that may or may not be valid. I don't think there is any code like that currently. > > ? The only errors that aren't due to bad arugments are ENOMEM and ones related to file descriptors (which are not applicable to uncommit). VirtualFree only fails due to bad arguments according to windows docs. > > So if there's consensus that ENOMEM is not recoverable (or rare enough to not worry about), then it seems like its OK to fatally fail in all scenarios. +1 Thanks for investigating the details of this (also nothing we couldn't change later if it bugs us). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2760736918 From rvansa at openjdk.org Fri Mar 28 11:08:43 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 28 Mar 2025 11:08:43 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization Message-ID: On the reproducer https://bugs.openjdk.org/secure/attachment/113985/CCC.java my local testing shows these numbers: ### JDK-17 $ hyperfine -w 5 -r 10 '/path/to/jdk-17/bin/java -cp /tmp/ CCC' Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp/ CCC Time (mean ? ?): 32.5 ms ? 0.9 ms [User: 27.5 ms, System: 10.6 ms] Range (min ? max): 31.1 ms ? 33.7 ms 10 runs ### JDK-25 before the change applied $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC Time (mean ? ?): 101.6 ms ? 1.5 ms [User: 96.8 ms, System: 14.6 ms] Range (min ? max): 99.0 ms ? 104.5 ms 10 runs ### JDK-25 with this patch $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC Time (mean ? ?): 75.8 ms ? 1.2 ms [User: 69.8 ms, System: 16.0 ms] Range (min ? max): 73.8 ms ? 78.2 ms 10 runs ------------- Commit messages: - 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization Changes: https://git.openjdk.org/jdk/pull/24290/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24290&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353175 Stats: 20 lines in 6 files changed: 2 ins; 3 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24290.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24290/head:pull/24290 PR: https://git.openjdk.org/jdk/pull/24290 From rvansa at openjdk.org Fri Mar 28 12:02:58 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 28 Mar 2025 12:02:58 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v2] In-Reply-To: References: Message-ID: > On the reproducer https://bugs.openjdk.org/secure/attachment/113985/CCC.java my local testing shows these numbers: > > ### JDK-17 > > $ hyperfine -w 5 -r 10 '/path/to/jdk-17/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp/ CCC > Time (mean ? ?): 32.5 ms ? 0.9 ms [User: 27.5 ms, System: 10.6 ms] > Range (min ? max): 31.1 ms ? 33.7 ms 10 runs > > ### JDK-25 before the change applied > > $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC > Time (mean ? ?): 101.6 ms ? 1.5 ms [User: 96.8 ms, System: 14.6 ms] > Range (min ? max): 99.0 ms ? 104.5 ms 10 runs > > ### JDK-25 with this patch > > $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC > Time (mean ? ?): 75.8 ms ? 1.2 ms [User: 69.8 ms, System: 16.0 ms] > Range (min ? max): 73.8 ms ? 78.2 ms 10 runs Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Fix compilation error in assertion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24290/files - new: https://git.openjdk.org/jdk/pull/24290/files/c4c5944a..11c2cb69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24290&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24290&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24290.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24290/head:pull/24290 PR: https://git.openjdk.org/jdk/pull/24290 From dholmes at openjdk.org Fri Mar 28 12:25:08 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 28 Mar 2025 12:25:08 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 15:11:06 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > address Windows issues I'm not convinced the startup hit is justified - some filesystems are relatively very slow. Maybe it should read on demand instead? Though that means more stuff that can't be done during crash handling from a signal context. I thought build time was an obvious solution as this is data that exists at build time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2761205783 From duke at openjdk.org Fri Mar 28 13:03:48 2025 From: duke at openjdk.org (Zihao Lin) Date: Fri, 28 Mar 2025 13:03:48 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v4] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8344116: C2: remove slice parameter from LoadNode::make ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/08c1a382..f6b2fbec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=02-03 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From aboldtch at openjdk.org Fri Mar 28 13:06:17 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 28 Mar 2025 13:06:17 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v15] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 14:47:36 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > Allow non-debug verify_self + comparator readability Sorry for being slow in reviewing. The changes look good. I've also tried using this tree as a replacement for the tree we have been using in our ZGC project and its been working almost seamless. I have a couple comments, I'll leave it up to you if it is something we want to fix in this PR or as followups if at all. Thanks for all the work. I'll talk with you next week and then approve. src/hotspot/share/utilities/rbTree.inline.hpp line 502: > 500: template > 501: inline void RBTree::visit_range_in_order(const K& from, const K& to, F f) const { > 502: assert(COMPARATOR::cmp(from, to) <= 0, "from must be less or equal to to"); Seem unfortunate to loose these assert, would be nice to find these sort of errors early. Maybe we can have some verification functions on the tree which takes (const K& from, const K& to, const NodeType* end_node) which can dispatch to the correct COMPARATOR function. src/hotspot/share/utilities/rbTree.inline.hpp line 548: > 546: return; > 547: } > 548: This could be a future enhancement. But it would be nice that if the COMPARATOR (or the NodeType) supplied a `cmp(const NodeType* a, const NodeType* b)` we could use it to check the order invariants for the children and parent. src/hotspot/share/utilities/rbTree.inline.hpp line 600: > 598: template > 599: template > 600: inline void AbstractRBTree::visit_range_in_order(const K& from, const K& to, F f) const { Preexisting. This is an exclusive end. I would think inclusive end would be more natural. Otherwise you cannot iterate all the way to the end. (Currently can be worked around if the largest possible K is not in the tree, by using it as `to`). ------------- PR Review: https://git.openjdk.org/jdk/pull/23416#pullrequestreview-2724962236 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r2018308634 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r2018403030 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r2018303212 From azafari at openjdk.org Fri Mar 28 13:25:22 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 28 Mar 2025 13:25:22 GMT Subject: RFR: 8352140: UBSAN: fix the left shift of negative value in klass.hpp, array_layout_helper() In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 18:07:02 GMT, Kim Barrett wrote: >> I had to emphasize that the case shown in the example may happen at run-time where compiler has no chance to warn/avoid/address it. >> My concern is that developers should not rely on the compiler to check the validation of left-shift op. They should be aware of the `signed` <-> `unsigned` and `int` <-> `long` <-> `long long` conversions during the left-shift. >> To find invalid cases of left-shift, UBSAN instruments them with assertions to catch them at run-time. If the assertion raised, good we found the problem. However, if no assertion raised for some left-shift ops, it doesn't mean that they are valid. > > Is there a way to tell ubsan that we care about detecting overflows, but we do not care about detecting > left shift of a negative value? Not that I can find, but maybe I missed something. `-fsanitize=shift-base` > looks like it would check for both overflow and (prior to C++20) negative base. We could disable > shift-base checking and do our own overflow assertion. (Which might want to be packaged up in a helper, > as discussed in https://github.com/openjdk/jdk/pull/24196.) For the case in this PR, the left-shift results in overflow, since the operand is either 0xFFFFFFFF or 0xFFFFFFFE. Should we have two versions of `left_shift()` and `left_shift_no_overflow()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24184#discussion_r2018650745 From vpaprotski at openjdk.org Fri Mar 28 14:33:30 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Fri, 28 Mar 2025 14:33:30 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 19:13:59 GMT, Volodymyr Paprotski wrote: >> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) >> >> Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. >> >> Before (no AVX512) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s >> >> After (with AVX2) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s >> >> >> Before (with AVX512): >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 o... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright stmt Thanks Tony! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23719#issuecomment-2761529451 From duke at openjdk.org Fri Mar 28 14:33:30 2025 From: duke at openjdk.org (duke) Date: Fri, 28 Mar 2025 14:33:30 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 19:13:59 GMT, Volodymyr Paprotski wrote: >> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) >> >> Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. >> >> Before (no AVX512) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s >> >> After (with AVX2) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s >> >> >> Before (with AVX512): >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 o... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright stmt @vpaprotsk Your change (at version 0a4230aa41f7cb5ddf5e2978f89ad4f0ec231e88) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23719#issuecomment-2761532342 From gziemski at openjdk.org Fri Mar 28 14:33:40 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 28 Mar 2025 14:33:40 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v3] In-Reply-To: References: Message-ID: > This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. > > I tried to fill in tag, when I was pretty certain that I had the right type. > > At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. Gerard Ziemski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8344883 - work - work ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24282/files - new: https://git.openjdk.org/jdk/pull/24282/files/582b1860..5a1a75e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=01-02 Stats: 2531 lines in 75 files changed: 1529 ins; 871 del; 131 mod Patch: https://git.openjdk.org/jdk/pull/24282.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24282/head:pull/24282 PR: https://git.openjdk.org/jdk/pull/24282 From mullan at openjdk.org Fri Mar 28 14:42:25 2025 From: mullan at openjdk.org (Sean Mullan) Date: Fri, 28 Mar 2025 14:42:25 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 19:13:59 GMT, Volodymyr Paprotski wrote: >> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) >> >> Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. >> >> Before (no AVX512) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s >> >> After (with AVX2) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s >> >> >> Before (with AVX512): >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 o... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright stmt I think it would also be useful to write a release note describing the approximate performance improvement gains for the crypto algorithms as displayed in your chart. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23719#issuecomment-2761566550 From gziemski at openjdk.org Fri Mar 28 14:58:22 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 28 Mar 2025 14:58:22 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 08:25:08 GMT, Stefan Karlsson wrote: > I went over the patch and added suggestions for places where I think you're using the wrong tag, or where I think it is obvious that there's a better tag than mtNone. I've also suggested removal of the now redundant 'executable' argument, which I want to see as little of as possible given that it is a wart on the memory reservation APIs (IMHO). Thank you for the feedback! I was wondering whether I should get rid of the exec, since it has the default parameter value, so I am glad that you have suggested it. It will be a definite improvement to get rid of it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2761605696 From vpaprotski at openjdk.org Fri Mar 28 15:23:34 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Fri, 28 Mar 2025 15:23:34 GMT Subject: Integrated: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 21:49:42 GMT, Volodymyr Paprotski wrote: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. > > Before (no AVX512) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s > > After (with AVX2) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s > > > Before (with AVX512): > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 102... This pull request has now been integrated. Changeset: a269bef0 Author: Volodymyr Paprotski URL: https://git.openjdk.org/jdk/commit/a269bef04cf3c9c8b731edcbf7618624f7571a2d Stats: 760 lines in 9 files changed: 641 ins; 16 del; 103 mod 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 Reviewed-by: ascarpino, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/23719 From mgronlun at openjdk.org Fri Mar 28 15:44:26 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 28 Mar 2025 15:44:26 GMT Subject: RFR: 8352251: Implement Cooperative JFR Sampling Message-ID: <2FEvWJYZrD5yRsmTCqrgR9Lit84szuFJxqwdpjghVog=.7deabc88-039f-423e-a4bd-e36399870273@github.com> Greetings, This is the implementation of JEP [JDK-8350338 Cooperative JFR Sampling](https://bugs.openjdk.org/browse/JDK-8350338). Implementations in this change set are provided and have been tested on the following platforms: - windows-x64 - windows-x64-debug - linux-x64 - linux-x64-debug - macosx-x64 - macosx-x64-debug - linux-aarch64 - linux-aarch64-debug - macosx-aarch64 - macosx-aarch64-debug Testing: tier1-6, jdk_jfr, stress testing. Platform porters note: Some platform-specific code needs to be provided, mainly in the interpreter. Take a look at the following files for changes: - src/hotspot/cpu/x86/frame_x86.cpp - src/hotspot/cpu/x86/interp_masm_x86.cpp - src/hotspot/cpu/x86/interp_masm_x86.hpp - src/hotspot/cpu/x86/javaFrameAnchor_x86.hpp - src/hotspot/cpu/x86/macroAssembler_x86.cpp - src/hotspot/cpu/x86/macroAssembler_x86.hpp - src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp - src/hotspot/cpu/x86/templateTable_x86.cpp - src/hotspot/os_cpu/linux_x86/javaThread_linux_x86.hpp Thanks Markus ------------- Commit messages: - 8352251 Changes: https://git.openjdk.org/jdk/pull/24296/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24296&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352251 Stats: 3260 lines in 77 files changed: 1960 ins; 958 del; 342 mod Patch: https://git.openjdk.org/jdk/pull/24296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24296/head:pull/24296 PR: https://git.openjdk.org/jdk/pull/24296 From syan at openjdk.org Fri Mar 28 16:48:35 2025 From: syan at openjdk.org (SendaoYan) Date: Fri, 28 Mar 2025 16:48:35 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 Message-ID: Hi all, Fix memory leak after JDK-8352184. ------------- Commit messages: - 8353189: [ASAN] memory leak after 8352184 Changes: https://git.openjdk.org/jdk/pull/24299/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24299&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353189 Stats: 15 lines in 3 files changed: 10 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24299/head:pull/24299 PR: https://git.openjdk.org/jdk/pull/24299 From aturbanov at openjdk.org Fri Mar 28 18:23:49 2025 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Fri, 28 Mar 2025 18:23:49 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 19:13:59 GMT, Volodymyr Paprotski wrote: >> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) >> >> Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. >> >> Before (no AVX512) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s >> >> After (with AVX2) >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s >> >> >> Before (with AVX512): >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 o... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright stmt src/java.base/share/classes/sun/security/util/math/intpoly/MontgomeryIntegerPolynomialP256.java line 164: > 162: protected void mult(long[] a, long[] b, long[] r) { > 163: multImpl(a, b, r); > 164: reducePositive(r); `reducePositive` is now seems unused ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r2019135976 From vpaprotski at openjdk.org Fri Mar 28 20:13:24 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Fri, 28 Mar 2025 20:13:24 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8] In-Reply-To: References: Message-ID: <9TJyGXccPFnDI60b2Wg3ZIuQH2nd6LC-pFgEs6p8x1c=.6308a314-dd48-4cb3-9986-8e6eb754d4c2@github.com> On Fri, 28 Mar 2025 18:20:31 GMT, Andrey Turbanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix copyright stmt > > src/java.base/share/classes/sun/security/util/math/intpoly/MontgomeryIntegerPolynomialP256.java line 164: > >> 162: protected void mult(long[] a, long[] b, long[] r) { >> 163: multImpl(a, b, r); >> 164: reducePositive(r); > > `reducePositive` is now seems unused oh.. hmm.. I had a second PR that I decided wasnt worth it that was going to reuse this code.. Will create a second JBS and remove ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r2019286778 From shade at openjdk.org Fri Mar 28 20:15:21 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 28 Mar 2025 20:15:21 GMT Subject: RFR: 8352251: Implement Cooperative JFR Sampling In-Reply-To: <2FEvWJYZrD5yRsmTCqrgR9Lit84szuFJxqwdpjghVog=.7deabc88-039f-423e-a4bd-e36399870273@github.com> References: <2FEvWJYZrD5yRsmTCqrgR9Lit84szuFJxqwdpjghVog=.7deabc88-039f-423e-a4bd-e36399870273@github.com> Message-ID: On Fri, 28 Mar 2025 15:38:59 GMT, Markus Gr?nlund wrote: > Greetings, > > This is the implementation of JEP [JDK-8350338 Cooperative JFR Sampling](https://bugs.openjdk.org/browse/JDK-8350338). > > Implementations in this change set are provided and have been tested on the following platforms: > > - windows-x64 > - windows-x64-debug > - linux-x64 > - linux-x64-debug > - macosx-x64 > - macosx-x64-debug > - linux-aarch64 > - linux-aarch64-debug > - macosx-aarch64 > - macosx-aarch64-debug > > Testing: tier1-6, jdk_jfr, stress testing. > > Platform porters note: > Some platform-specific code needs to be provided, mainly in the interpreter. Take a look at the following files for changes: > > - src/hotspot/cpu/x86/frame_x86.cpp > - src/hotspot/cpu/x86/interp_masm_x86.cpp > - src/hotspot/cpu/x86/interp_masm_x86.hpp > - src/hotspot/cpu/x86/javaFrameAnchor_x86.hpp > - src/hotspot/cpu/x86/macroAssembler_x86.cpp > - src/hotspot/cpu/x86/macroAssembler_x86.hpp > - src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp > - src/hotspot/cpu/x86/templateTable_x86.cpp > - src/hotspot/os_cpu/linux_x86/javaThread_linux_x86.hpp > > Thanks > Markus A drive-by comment, since I am cleaning up some of x86 code after x86_32 removals: src/hotspot/cpu/x86/interp_masm_x86.cpp line 791: > 789: > 790: // Thread argument > 791: mov(c_rarg0, r15_thread); `movptr`, I'd think? src/hotspot/cpu/x86/interp_masm_x86.cpp line 796: > 794: Label L_ljf, L_valid_rbp; > 795: testptr(rbp, rbp); > 796: jcc(Assembler::notZero, L_valid_rbp); Little trivia about x86: "normal" jumps are fairly long (6 bytes, IIRC), but you can have a "short jump" (about 3 bytes, IIRC) if the offset is small. This is normally handled by assembler itself, when it knows where the branch target is. That is known for _backward_ branches. For _forward_ branches like this, we unfortunately often need to claim the jump is short ahead of time. See `jccb` (note `b`). Basically, if you are sure the branch target is within 128 bytes (or, more likely, a few instructions) away, use `jccb`. Same thing with `jmp` -> `jmpb`. src/hotspot/cpu/x86/interp_masm_x86.cpp line 1045: > 1043: Label slow_path; > 1044: Label fast_path; > 1045: safepoint_poll(slow_path, rthread, current_fp, true /* at_return */, false /* in_nmethod */); A little heads-up: I am going to propose a little cleanup soon to drop `rthread` from x86 safepoint_pool (we can trust it is `r15_thread` always). That would probably yield a minor merge conflict here. ------------- PR Review: https://git.openjdk.org/jdk/pull/24296#pullrequestreview-2726620088 PR Review Comment: https://git.openjdk.org/jdk/pull/24296#discussion_r2019283641 PR Review Comment: https://git.openjdk.org/jdk/pull/24296#discussion_r2019283404 PR Review Comment: https://git.openjdk.org/jdk/pull/24296#discussion_r2019285766 From vpaprotski at openjdk.org Fri Mar 28 20:23:20 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Fri, 28 Mar 2025 20:23:20 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 14:39:23 GMT, Sean Mullan wrote: > I think it would also be useful to write a release note describing the approximate performance improvement gains for the crypto algorithms as displayed in your chart. Thanks. @seanjmullan I think I only done that once, cant find the 'instructions'.. I think Jamil had helped me, but.. (https://bugs.openjdk.org/browse/JDK-8297970) "Create subtask with 'release-note' label?" ------------- PR Comment: https://git.openjdk.org/jdk/pull/23719#issuecomment-2762362675 From mgronlun at openjdk.org Fri Mar 28 21:17:24 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 28 Mar 2025 21:17:24 GMT Subject: RFR: 8352251: Implement Cooperative JFR Sampling [v2] In-Reply-To: <2FEvWJYZrD5yRsmTCqrgR9Lit84szuFJxqwdpjghVog=.7deabc88-039f-423e-a4bd-e36399870273@github.com> References: <2FEvWJYZrD5yRsmTCqrgR9Lit84szuFJxqwdpjghVog=.7deabc88-039f-423e-a4bd-e36399870273@github.com> Message-ID: > Greetings, > > This is the implementation of JEP [JDK-8350338 Cooperative JFR Sampling](https://bugs.openjdk.org/browse/JDK-8350338). > > Implementations in this change set are provided and have been tested on the following platforms: > > - windows-x64 > - windows-x64-debug > - linux-x64 > - linux-x64-debug > - macosx-x64 > - macosx-x64-debug > - linux-aarch64 > - linux-aarch64-debug > - macosx-aarch64 > - macosx-aarch64-debug > > Testing: tier1-6, jdk_jfr, stress testing. > > Platform porters note: > Some platform-specific code needs to be provided, mainly in the interpreter. Take a look at the following files for changes: > > - src/hotspot/cpu/x86/frame_x86.cpp > - src/hotspot/cpu/x86/interp_masm_x86.cpp > - src/hotspot/cpu/x86/interp_masm_x86.hpp > - src/hotspot/cpu/x86/javaFrameAnchor_x86.hpp > - src/hotspot/cpu/x86/macroAssembler_x86.cpp > - src/hotspot/cpu/x86/macroAssembler_x86.hpp > - src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp > - src/hotspot/cpu/x86/templateTable_x86.cpp > - src/hotspot/os_cpu/linux_x86/javaThread_linux_x86.hpp > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24296/files - new: https://git.openjdk.org/jdk/pull/24296/files/cb6e5aab..7998b7c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24296&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24296&range=00-01 Stats: 218 lines in 8 files changed: 48 ins; 106 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/24296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24296/head:pull/24296 PR: https://git.openjdk.org/jdk/pull/24296 From mgronlun at openjdk.org Fri Mar 28 21:24:15 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 28 Mar 2025 21:24:15 GMT Subject: RFR: 8352251: Implement Cooperative JFR Sampling [v3] In-Reply-To: <2FEvWJYZrD5yRsmTCqrgR9Lit84szuFJxqwdpjghVog=.7deabc88-039f-423e-a4bd-e36399870273@github.com> References: <2FEvWJYZrD5yRsmTCqrgR9Lit84szuFJxqwdpjghVog=.7deabc88-039f-423e-a4bd-e36399870273@github.com> Message-ID: > Greetings, > > This is the implementation of JEP [JDK-8350338 Cooperative JFR Sampling](https://bugs.openjdk.org/browse/JDK-8350338). > > Implementations in this change set are provided and have been tested on the following platforms: > > - windows-x64 > - windows-x64-debug > - linux-x64 > - linux-x64-debug > - macosx-x64 > - macosx-x64-debug > - linux-aarch64 > - linux-aarch64-debug > - macosx-aarch64 > - macosx-aarch64-debug > > Testing: tier1-6, jdk_jfr, stress testing. > > Platform porters note: > Some platform-specific code needs to be provided, mainly in the interpreter. Take a look at the following files for changes: > > - src/hotspot/cpu/x86/frame_x86.cpp > - src/hotspot/cpu/x86/interp_masm_x86.cpp > - src/hotspot/cpu/x86/interp_masm_x86.hpp > - src/hotspot/cpu/x86/javaFrameAnchor_x86.hpp > - src/hotspot/cpu/x86/macroAssembler_x86.cpp > - src/hotspot/cpu/x86/macroAssembler_x86.hpp > - src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp > - src/hotspot/cpu/x86/templateTable_x86.cpp > - src/hotspot/os_cpu/linux_x86/javaThread_linux_x86.hpp > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: - align params - adjustments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24296/files - new: https://git.openjdk.org/jdk/pull/24296/files/7998b7c1..1015a8c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24296&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24296&range=01-02 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24296/head:pull/24296 PR: https://git.openjdk.org/jdk/pull/24296 From dnsimon at openjdk.org Fri Mar 28 22:24:40 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 28 Mar 2025 22:24:40 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v6] In-Reply-To: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: > This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). > > By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. > > The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. > > I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. > > When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: > > java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: > > java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci > > at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) > at java.base/java.lang.Thread.run(Thread.java:1447) > Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: > > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Optim... Doug Simon has updated the pull request incrementally with one additional commit since the last revision: convert Windows path to Unix path ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24247/files - new: https://git.openjdk.org/jdk/pull/24247/files/c93e6646..921e3251 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24247&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24247&range=04-05 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24247/head:pull/24247 PR: https://git.openjdk.org/jdk/pull/24247 From vlivanov at openjdk.org Fri Mar 28 22:55:21 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 28 Mar 2025 22:55:21 GMT Subject: RFR: 8353216: Improve VerifyMethodHandles for method handle linkers Message-ID: Add extra verification logic into `MethodHandle::invokeBasic/linkTo*` to ensure that holder classes are properly initialized. The patch covers x86 and aarch64 platforms. There are some differences in expectations between invocation modes. While `invokeStatic` assumes a clinit barrier (and `invokeBasic` just requires the holder class to be fully initialized), other invocation modes can only expect that class initialization has been initiated (due to class initialization failures and premature publication, instances of partially initialized classes can be observed). Testing: hs-tier1 - hs-tier4 ------------- Commit messages: - No need to care about x86-32 anymore - A branch too far... - Specialize for linker type - Limit to concrete methods - Verify method holder Changes: https://git.openjdk.org/jdk/pull/23950/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23950&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353216 Stats: 109 lines in 4 files changed: 96 ins; 2 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/23950.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23950/head:pull/23950 PR: https://git.openjdk.org/jdk/pull/23950 From kvn at openjdk.org Fri Mar 28 23:15:24 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 28 Mar 2025 23:15:24 GMT Subject: RFR: 8353216: Improve VerifyMethodHandles for method handle linkers In-Reply-To: References: Message-ID: <6y88-EtMdLu4UoAA8abMrW_d96Eb8XriAjPW41d8ocs=.91fbe34c-e768-4303-8012-79348047563e@github.com> On Fri, 7 Mar 2025 20:58:15 GMT, Vladimir Ivanov wrote: > Add extra verification logic into `MethodHandle::invokeBasic/linkTo*` to ensure that holder classes are properly initialized. > > The patch covers x86 and aarch64 platforms. > > There are some differences in expectations between invocation modes. > While `invokeStatic` assumes a clinit barrier (and `invokeBasic` just requires the holder class to be fully initialized), other invocation modes can only expect that class initialization has been initiated (due to class initialization failures and premature publication, instances of partially initialized classes can be observed). > > Testing: hs-tier1 - hs-tier4 src/hotspot/cpu/x86/methodHandles_x86.cpp line 131: > 129: Label L_ok; > 130: > 131: const Register method_holder = temp; `assert_different_registers` missing. Or you don't need it? src/hotspot/cpu/x86/methodHandles_x86.cpp line 139: > 137: // Require compiled LambdaForm class to be fully initialized. > 138: __ cmpb(Address(method_holder, InstanceKlass::init_state_offset()), InstanceKlass::fully_initialized); > 139: __ jccb(Assembler::equal, L_ok); Should this also be long jump? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23950#discussion_r2019548630 PR Review Comment: https://git.openjdk.org/jdk/pull/23950#discussion_r2019549790 From vlivanov at openjdk.org Sat Mar 29 01:18:33 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 29 Mar 2025 01:18:33 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 Message-ID: Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. Testing: hs-tier1 - hs-tier4, microbenchmarks ------------- Commit messages: - Build sleef on macos-aarch64 - Move libsleef sources to shared/native Changes: https://git.openjdk.org/jdk/pull/24306/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24306&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353217 Stats: 18 lines in 174 files changed: 14 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24306.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24306/head:pull/24306 PR: https://git.openjdk.org/jdk/pull/24306 From vlivanov at openjdk.org Sat Mar 29 01:18:33 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 29 Mar 2025 01:18:33 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: <4da5jkxaUrMVeNqLG1rD9wjSt96gC_Kp6-hNSfxJheE=.8a34d351-235f-43c6-a151-5e2eed622053@github.com> On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov wrote: > Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. > > It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. > > PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. > > Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. > > Testing: hs-tier1 - hs-tier4, microbenchmarks Microbenchmark results on Apple M1 Pro: Benchmark | Throughput | Allocation rate | | Before After | Before After | ======================|=======================================|===================================================| Float128Vector.ACOS | 3.856 ?0.013 1.941 ? 0.008 us/op | 6076.461 ? 20.067 0.007 ?0.001 MB/sec | Float128Vector.ASIN | 3.813 ?0.014 1.512 ? 0.017 us/op | 6145.040 ? 22.824 0.007 ?0.001 MB/sec | Float128Vector.ATAN | 7.124 ?0.040 2.220 ? 0.003 us/op | 3289.059 ? 18.539 0.007 ?0.001 MB/sec | Float128Vector.ATAN2 | 16.983 ?1.031 3.412 ? 0.038 us/op | 2075.808 ?127.179 0.007 ?0.001 MB/sec | Float128Vector.CBRT | 6.431 ?0.014 4.075 ? 0.011 us/op | 3643.789 ? 7.933 0.007 ?0.001 MB/sec | Float128Vector.COS | 8.269 ?0.094 5.614 ? 0.026 us/op | 2833.915 ? 32.041 0.007 ?0.001 MB/sec | Float128Vector.COSH | 5.779 ?0.020 3.072 ? 0.010 us/op | 4054.800 ? 14.028 0.007 ?0.001 MB/sec | Float128Vector.EXP | 5.456 ?0.006 0.936 ? 0.004 us/op | 4294.853 ? 5.025 0.007 ?0.001 MB/sec | Float128Vector.EXPM1 | 6.888 ?0.059 2.972 ? 0.010 us/op | 3402.363 ? 28.694 0.007 ?0.001 MB/sec | Float128Vector.HYPOT | 6.369 ?0.013 2.213 ? 0.008 us/op | 5519.051 ? 11.103 0.007 ?0.001 MB/sec | Float128Vector.LOG | 8.469 ?0.574 1.729 ? 0.004 us/op | 2775.039 ?157.629 0.007 ?0.001 MB/sec | Float128Vector.LOG10 | 15.235 ?1.039 1.830 ? 0.006 us/op | 1544.009 ?107.436 0.007 ?0.001 MB/sec | Float128Vector.LOG1P | 8.823 ?0.040 1.745 ? 0.014 us/op | 2655.757 ? 11.964 0.007 ?0.001 MB/sec | Float128Vector.POW | 27.511 ?0.918 7.467 ? 0.033 us/op | 1278.693 ? 42.538 0.007 ?0.001 MB/sec | Float128Vector.SIN | 7.846 ?0.063 5.822 ? 0.015 us/op | 2986.480 ? 24.025 0.007 ?0.001 MB/sec | Float128Vector.SINH | 5.747 ?0.033 3.206 ? 0.034 us/op | 4077.645 ? 23.305 0.007 ?0.001 MB/sec | Float128Vector.TAN | 22.337 ?0.533 6.114 ? 0.016 us/op | 1049.469 ? 24.969 0.007 ?0.001 MB/sec | Double128Vector.ACOS | 5.789 ?0.107 4.635 ? 0.013 us/op | 8097.069 ?146.593 0.007 ?0.001 MB/sec | Double128Vector.ASIN | 5.655 ?0.011 3.858 ? 0.017 us/op | 8287.521 ? 16.023 0.007 ?0.001 MB/sec | Double128Vector.ATAN | 10.082 ?0.046 6.016 ? 0.016 us/op | 4648.068 ? 21.401 0.007 ?0.001 MB/sec | Double128Vector.ATAN2 | 17.286 ?0.113 8.148 ? 0.015 us/op | 4067.019 ? 26.586 0.007 ?0.001 MB/sec | Double128Vector.CBRT | 9.779 ?0.048 8.861 ? 0.045 us/op | 4792.419 ? 23.381 0.007 ?0.001 MB/sec | Double128Vector.COS | 9.071 ?0.107 6.948 ? 0.027 us/op | 5166.999 ? 59.377 0.007 ?0.001 MB/sec | Double128Vector.COSH | 8.234 ?0.030 6.403 ? 0.025 us/op | 5692.144 ? 20.625 0.007 ?0.001 MB/sec | Double128Vector.EXP | 7.506 ?0.012 3.073 ? 0.013 us/op | 6243.783 ? 10.382 0.007 ?0.001 MB/sec | Double128Vector.EXPM1 | 9.122 ?0.036 6.122 ? 0.036 us/op | 5137.721 ? 20.350 0.007 ?0.001 MB/sec | Double128Vector.HYPOT | 13.445 ?0.248 4.596 ? 0.035 us/op | 5229.977 ? 96.222 0.007 ?0.001 MB/sec | Double128Vector.LOG | 10.396 ?0.042 4.629 ? 0.081 us/op | 4507.928 ? 18.101 0.007 ?0.001 MB/sec | Double128Vector.LOG10 | 13.923 ?0.046 4.889 ? 0.021 us/op | 3365.944 ? 11.078 0.007 ?0.001 MB/sec | Double128Vector.LOG1P | 12.336 ?0.045 5.010 ? 0.027 us/op | 3799.204 ? 13.816 0.007 ?0.001 MB/sec | Double128Vector.POW | 28.852 ?0.043 15.270 ? 0.081 us/op | 2436.503 ? 3.647 0.007 ?0.001 MB/sec | Double128Vector.SIN | 8.821 ?0.018 6.309 ? 0.037 us/op | 5313.077 ? 11.056 0.007 ?0.001 MB/sec | Double128Vector.SINH | 8.289 ?0.037 6.566 ? 0.029 us/op | 5654.264 ? 25.538 0.007 ?0.001 MB/sec | Double128Vector.TAN | 25.535 ?0.636 9.788 ? 0.036 us/op | 1836.177 ? 44.430 0.007 ?0.001 MB/sec | ------------- PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2762959907 From vlivanov at openjdk.org Sat Mar 29 01:22:11 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 29 Mar 2025 01:22:11 GMT Subject: RFR: 8353216: Improve VerifyMethodHandles for method handle linkers [v2] In-Reply-To: References: Message-ID: > Add extra verification logic into `MethodHandle::invokeBasic/linkTo*` to ensure that holder classes are properly initialized. > > The patch covers x86 and aarch64 platforms. > > There are some differences in expectations between invocation modes. > While `invokeStatic` assumes a clinit barrier (and `invokeBasic` just requires the holder class to be fully initialized), other invocation modes can only expect that class initialization has been initiated (due to class initialization failures and premature publication, instances of partially initialized classes can be observed). > > Testing: hs-tier1 - hs-tier4 Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: - assert_different_registers on x86 - jcc->jccb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23950/files - new: https://git.openjdk.org/jdk/pull/23950/files/677451ca..ef7fa5cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23950&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23950&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23950.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23950/head:pull/23950 PR: https://git.openjdk.org/jdk/pull/23950 From vlivanov at openjdk.org Sat Mar 29 01:29:28 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 29 Mar 2025 01:29:28 GMT Subject: RFR: 8353216: Improve VerifyMethodHandles for method handle linkers [v2] In-Reply-To: <6y88-EtMdLu4UoAA8abMrW_d96Eb8XriAjPW41d8ocs=.91fbe34c-e768-4303-8012-79348047563e@github.com> References: <6y88-EtMdLu4UoAA8abMrW_d96Eb8XriAjPW41d8ocs=.91fbe34c-e768-4303-8012-79348047563e@github.com> Message-ID: On Fri, 28 Mar 2025 23:11:14 GMT, Vladimir Kozlov wrote: >> Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: >> >> - assert_different_registers on x86 >> - jcc->jccb > > src/hotspot/cpu/x86/methodHandles_x86.cpp line 131: > >> 129: Label L_ok; >> 130: >> 131: const Register method_holder = temp; > > `assert_different_registers` missing. Or you don't need it? Added. > src/hotspot/cpu/x86/methodHandles_x86.cpp line 139: > >> 137: // Require compiled LambdaForm class to be fully initialized. >> 138: __ cmpb(Address(method_holder, InstanceKlass::init_state_offset()), InstanceKlass::fully_initialized); >> 139: __ jccb(Assembler::equal, L_ok); > > Should this also be long jump? It jumps over the code generated by `MacroAssembler::stop`, so a short jump instruction should be fine here. I did the opposite: turned the jump for other linkers from `jcc` to `jccb`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23950#discussion_r2019647257 PR Review Comment: https://git.openjdk.org/jdk/pull/23950#discussion_r2019647780 From fjiang at openjdk.org Sat Mar 29 02:57:21 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 29 Mar 2025 02:57:21 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Thu, 27 Mar 2025 11:22:48 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> |RVWMO| Patched| >> | ---------- | ---------- | >> |fence iorw,iorw| fence iorw,ow| >> |sw t4,120(t2) | sw t4,120(t2) | >> |fence ow,ir | unnecessary_membar_volatile_rvwmo | >> | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | >> |fence iorw,ow | fence iorw,ow| >> |sw t5,124(t2) |sw t5,124(t2) | >> >> |TSO | Patched| >> | ---------- | ---------- | >> | lw a4,120(t2) | lw a6,120(t2) | >> | sw a0,124(t2) | sw t6,124(t2) | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> | sw t4,120(t2) | sw t4,120(t2) | >> | fence ow,ir | unnecessary_membar_volatile_tso | >> | sw t6,128(t2) | sw t5,128(t2) | >> | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> |... | ... | >> | sw a3,120(t2) | sw a0,120(t2) | >> | fence ow,ir | fence ow,ir | >> | lw a7,124(t2) | lw a5,124(t2) | >> >> For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. >> >> The patch do: >> - Separate ztso and rvwmo in ad by using UseZtso predicate. >> - Match all that requires the same membar. >> - Make fence/fencei protected as they shouldn't be using directly. >> - Increased cost of membars to VOLATILE_REF_COST. >> - Added a real_empty pipe. >> - Change to pipe_slow on TSO (as x86). >> >> Note that C2-rv64 is now superior to gcc/clang regrading fencing: >> https://godbolt.org/z/6E3YTP15j >> >> Testing jcstress, tier1 and manually reading the generated assembly. >> Doing additional testing, but RFR it now as it may need some consideration. >> >> /Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into tso-merge > - Merge branch 'master' into tso-merge > - format comment > - Merge branch 'master' into tso-merge > - Review comments > - Merge branch 'master' into tso-merge > - Review comments > - Fixed ws > - Revert NC > - Fixed comment > - ... and 1 more: https://git.openjdk.org/jdk/compare/36c9029d...c2688a6a Looks good! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/24035#pullrequestreview-2727224899 From ccheung at openjdk.org Sat Mar 29 03:00:47 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Sat, 29 Mar 2025 03:00:47 GMT Subject: RFR: 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 Message-ID: Two archive relocation tests failed when `-XX:ArchiveRelocationMode=0` is specified via the jtreg `-javaoption`. A fix is to add a `WhiteBox.getArchiveRelocationMode()` method so that the tests can check if the `ArchiveRelocationMode` is set to 0 before checking the expected output. Passed tiers 1 - 4 testing. ------------- Commit messages: - whitespace error - 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 Changes: https://git.openjdk.org/jdk/pull/24308/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24308&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353129 Stats: 27 lines in 4 files changed: 21 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24308.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24308/head:pull/24308 PR: https://git.openjdk.org/jdk/pull/24308 From duke at openjdk.org Sat Mar 29 07:19:21 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 29 Mar 2025 07:19:21 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v5] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request incrementally with two additional commits since the last revision: - Fix build - Fix test failed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/f6b2fbec..a1924c35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From rehn at openjdk.org Sat Mar 29 15:03:18 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Sat, 29 Mar 2025 15:03:18 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Sat, 29 Mar 2025 02:54:53 GMT, Feilong Jiang wrote: > Looks good! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24035#issuecomment-2763441815 From kvn at openjdk.org Sat Mar 29 17:46:10 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 29 Mar 2025 17:46:10 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: <9z8Yl3PRcBG47ggt1gTR_soWGo8lPUHt-A8yFOjgYjI=.9827b5a9-1ce3-4e24-b795-19814af3e30f@github.com> On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov wrote: > Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. > > It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. > > PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. > > Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. > > Testing: hs-tier1 - hs-tier4, microbenchmarks Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24306#pullrequestreview-2727546251 From jwaters at openjdk.org Sat Mar 29 17:46:10 2025 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 29 Mar 2025 17:46:10 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov wrote: > Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. > > It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. > > PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. > > Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. > > Testing: hs-tier1 - hs-tier4, microbenchmarks make/modules/jdk.incubator.vector/Lib.gmk line 83: > 81: SRC := libsleef/lib, \ > 82: EXTRA_SRC := libsleef/generated, \ > 83: DISABLED_WARNINGS_gcc := unused-function sign-compare tautological-compare ignored-qualifiers, \ DISABLED_WARNINGS_gcc is technically not needed, gcc is not a supported compiler on macOS, at least, not yet... If you feel that gcc support for macOS is a worthy addition to have to make the lives of future compiler porters, you can leave it in there. It's otherwise up to you whether you want to remove it or not (I personally like the idea of being able to compile for macOS with gcc in the future however) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24306#discussion_r2019888623 From jwaters at openjdk.org Sat Mar 29 17:49:06 2025 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 29 Mar 2025 17:49:06 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov wrote: > Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. > > It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. > > PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. > > Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. > > Testing: hs-tier1 - hs-tier4, microbenchmarks Is leaving the sources of sleef in share/native the right thing to do? That would implicitly mean to any developers that it's shared code for all currently supported operating systems: Windows, macOS, Linux and AIX, which may be rather confusing if it's only meant to be used on specific platforms. But I can also see why this was done, since duplication of sleef code for each platform would be pretty brutal, so there isn't an easy solution to this. I guess Windows/ARM64 could use it in the future, is that something that is intended? Perhaps that could be an excuse for leaving it under the share directory ------------- PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2763887362 From kbarrett at openjdk.org Sat Mar 29 21:53:33 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 29 Mar 2025 21:53:33 GMT Subject: RFR: 8352565: Add native method implementation of Reference.get() Message-ID: Please review this change which adds a native method providing the implementation of Reference::get. Referece::get is an intrinsic candidate, so this native method implementation is only used when the intrinsic is not. Currently there is intrinsic support by the interpreter, C1, C2, and graal, which are always used. With this change we can later remove all the per-platform interpreter intrinsic implementations, and might also remove the C1 intrinsic implementation. Testing: (1) mach5 tier1-6 normal (so using all the existing intrinsics). (2) mach5 tier1-6 with interpreter and C1 Reference::get intrinsics disabled. ------------- Commit messages: - test native method - native Reference.get helper Changes: https://git.openjdk.org/jdk/pull/24315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24315&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352565 Stats: 203 lines in 5 files changed: 199 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24315/head:pull/24315 PR: https://git.openjdk.org/jdk/pull/24315 From kbarrett at openjdk.org Sun Mar 30 01:17:13 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 30 Mar 2025 01:17:13 GMT Subject: RFR: 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 02:47:53 GMT, Calvin Cheung wrote: > Two archive relocation tests failed when `-XX:ArchiveRelocationMode=0` is specified via the jtreg `-javaoption`. > A fix is to add a `WhiteBox.getArchiveRelocationMode()` method so that the tests can check if the `ArchiveRelocationMode` is set to 0 before checking the expected output. > > Passed tiers 1 - 4 testing. src/hotspot/share/prims/whitebox.cpp line 2139: > 2137: #else > 2138: ShouldNotReachHere(); > 2139: return (jint)-1; Unnecessary dead code. Or maybe this shouldn't be using `ShouldNotReachHere()`. Is the intent to crash in a debug build but return an error code in a product build? If so, `ShouldNotReachHere()` doesn't provide that behavior, as it affects product builds too. test/hotspot/jtreg/runtime/cds/appcds/dynamicArchive/DynamicArchiveRelocationTest.java line 48: > 46: static int relocationMode = -1; > 47: public static void main(String... args) throws Exception { > 48: WhiteBox wb = WhiteBox.getWhiteBox(); It seems this test already had WhiteBox enabled, but wasn't actually using it before this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24308#discussion_r2020040853 PR Review Comment: https://git.openjdk.org/jdk/pull/24308#discussion_r2020041231 From stuefe at openjdk.org Sun Mar 30 09:11:20 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 30 Mar 2025 09:11:20 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v3] In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: <2dWME1xR1HwYIEJM4L1EwGqaezCKYc6DQeinhsK5lmQ=.dd439a51-c4d9-466c-ae66-a1b9a9574cd5@github.com> On Thu, 27 Mar 2025 13:42:14 GMT, Matthias Baesken wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use >> - skip test if we have no COH archive >> - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use >> - aix fix >> - test and aix exclusion >> - Fix windows when ArchiveRelocationMode=0 or 2 >> - original > > src/hotspot/share/cds/archiveBuilder.cpp line 329: > >> 327: if (CDSConfig::is_dumping_static_archive()) { >> 328: _current_dump_region = &_pz_region; >> 329: _current_dump_region->init(&_shared_rs, &_shared_vs); > > Second line in 'if' and 'else' seems to be identical ? fixed > src/hotspot/share/cds/archiveUtils.cpp line 85: > >> 83: >> 84: // The number of bits used by the rw/ro ptrmaps. We might have lots of zero >> 85: // bits at the bottom and top of rrw/ro ptrmaps, but these zeros will be > > What means rrw here ? typo; foxed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23912#discussion_r2020104442 PR Review Comment: https://git.openjdk.org/jdk/pull/23912#discussion_r2020104299 From stuefe at openjdk.org Sun Mar 30 09:16:16 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 30 Mar 2025 09:16:16 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v4] In-Reply-To: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: > Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. > > JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). > Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). > > The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . > > The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. > > Tests: > - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths > - SAP reports all tests green (they had reported errors with the previous version) > - Oracle Tests ongoing > - GHAs green Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Merge branch 'master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use - feedback Matthias - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use - skip test if we have no COH archive - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use - aix fix - test and aix exclusion - Fix windows when ArchiveRelocationMode=0 or 2 - original ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23912/files - new: https://git.openjdk.org/jdk/pull/23912/files/f7dd4f5d..f7ef5586 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23912&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23912&range=02-03 Stats: 72001 lines in 2014 files changed: 22243 ins; 40780 del; 8978 mod Patch: https://git.openjdk.org/jdk/pull/23912.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23912/head:pull/23912 PR: https://git.openjdk.org/jdk/pull/23912 From stuefe at openjdk.org Sun Mar 30 09:16:16 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 30 Mar 2025 09:16:16 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: On Fri, 21 Mar 2025 12:25:36 GMT, Matthias Baesken wrote: >>> I haven't seen test errors with this new version. @JoKern65, @MBaesken: Are you aware of any problems? >> >> No, I'm not aware of any problems. > >> I haven't seen test errors with this new version. @JoKern65, @MBaesken: Are you aware of any problems? > > I am not aware of issues related to this change. Thank you @MBaesken ! Worked in your feedback. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2764464921 From mbaesken at openjdk.org Sun Mar 30 15:47:11 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Sun, 30 Mar 2025 15:47:11 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v4] In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: <83Zl6SdZ356HyKnBd6JXz3ZHKZbpHqaJ_eA6bAb4y8M=.08b0d7c0-de39-4ff6-b920-c4f1206bc1e4@github.com> On Sun, 30 Mar 2025 09:16:16 GMT, Thomas Stuefe wrote: >> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. >> >> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). >> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). >> >> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . >> >> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. >> >> Tests: >> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths >> - SAP reports all tests green (they had reported errors with the previous version) >> - Oracle Tests ongoing >> - GHAs green > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Merge branch 'master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use > - feedback Matthias > - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use > - skip test if we have no COH archive > - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use > - aix fix > - test and aix exclusion > - Fix windows when ArchiveRelocationMode=0 or 2 > - original Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23912#pullrequestreview-2727935625 From stuefe at openjdk.org Sun Mar 30 16:45:39 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 30 Mar 2025 16:45:39 GMT Subject: Integrated: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: On Wed, 5 Mar 2025 06:34:14 GMT, Thomas Stuefe wrote: > Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174. > > JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). > Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A). > > The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 . > > The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint. > > Tests: > - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths > - SAP reports all tests green (they had reported errors with the previous version) > - Oracle Tests ongoing > - GHAs green This pull request has now been integrated. Changeset: 59629f88 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/59629f88e6fad9c1ff91be4cfea83f78f0ea503c Stats: 462 lines in 16 files changed: 368 ins; 29 del; 65 mod 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use Reviewed-by: mbaesken, iklam ------------- PR: https://git.openjdk.org/jdk/pull/23912 From stuefe at openjdk.org Sun Mar 30 16:45:39 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 30 Mar 2025 16:45:39 GMT Subject: RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v3] In-Reply-To: References: <5iGpdnJ_UmV2ZO-JJJfs_EEyrTQxnn3EhxWw1cMcNTg=.2991bb2d-6d16-4252-8660-ff259d05ea7c@github.com> Message-ID: On Fri, 14 Mar 2025 06:43:54 GMT, Ioi Lam wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use >> - skip test if we have no COH archive >> - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use >> - aix fix >> - test and aix exclusion >> - Fix windows when ArchiveRelocationMode=0 or 2 >> - original > > LGTM Thanks @iklam and @MBaesken ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2764641150 From dholmes at openjdk.org Mon Mar 31 00:46:09 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 31 Mar 2025 00:46:09 GMT Subject: RFR: 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 In-Reply-To: References: Message-ID: On Sun, 30 Mar 2025 01:05:48 GMT, Kim Barrett wrote: >> Two archive relocation tests failed when `-XX:ArchiveRelocationMode=0` is specified via the jtreg `-javaoption`. >> A fix is to add a `WhiteBox.getArchiveRelocationMode()` method so that the tests can check if the `ArchiveRelocationMode` is set to 0 before checking the expected output. >> >> Passed tiers 1 - 4 testing. > > src/hotspot/share/prims/whitebox.cpp line 2139: > >> 2137: #else >> 2138: ShouldNotReachHere(); >> 2139: return (jint)-1; > > Unnecessary dead code. Or maybe this shouldn't be using `ShouldNotReachHere()`. Is the intent > to crash in a debug build but return an error code in a product build? If so, `ShouldNotReachHere()` > doesn't provide that behavior, as it affects product builds too. I think the intent is to crash if someone runs a test that uses CDS and this API in a build without CDS available. You literally should not reach here as the test should have been skipped at a higher level - that is true for debug and product test runs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24308#discussion_r2020303022 From dholmes at openjdk.org Mon Mar 31 01:00:21 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 31 Mar 2025 01:00:21 GMT Subject: RFR: 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 In-Reply-To: References: Message-ID: <7R24pvsFCXdrj84C24wvdfS1BGrQlvS3jys8r9kD744=.491edc21-645c-4cb7-846a-f5e1e93ca7f5@github.com> On Sat, 29 Mar 2025 02:47:53 GMT, Calvin Cheung wrote: > Two archive relocation tests failed when `-XX:ArchiveRelocationMode=0` is specified via the jtreg `-javaoption`. > A fix is to add a `WhiteBox.getArchiveRelocationMode()` method so that the tests can check if the `ArchiveRelocationMode` is set to 0 before checking the expected output. > > Passed tiers 1 - 4 testing. I think these tests are very confusing! test/hotspot/jtreg/runtime/cds/appcds/ArchiveRelocationTest.java States > @comment the test uses -XX:ArchiveRelocationMode=1 to force relocation. but that is not what it does. It either sets -`XX:ArchiveRelocationMode=0` in the exec'd VM or it relies on the default being 1 - which is not the case if it was set directly via JTREG. So it seems to me the right, and simple, fix here is to always pass the expected `-XX:ArchiveRelocationMode` value to the exec'd VM and ignore/override whatever comes in via the command-line. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24308#issuecomment-2764860687 From pminborg at openjdk.org Mon Mar 31 06:35:06 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 06:35:06 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v11] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Remove empty instances ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/8b4113f6..f90557b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=09-10 Stats: 79 lines in 4 files changed: 3 ins; 73 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From pminborg at openjdk.org Mon Mar 31 06:35:07 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 06:35:07 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v8] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 23:28:46 GMT, Johannes Graham wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Revamp toString() methods > > src/java.base/share/classes/jdk/internal/lang/stable/StableValueFactories.java line 41: > >> 39: public static Function function(Set inputs, >> 40: Function original) { >> 41: if (inputs.isEmpty()) { > > If it is worth optimizing the isEmpty scenario, it might be preferable to let each xxxFunction.of return an appropriate instance, to keep the number of varying subclasses to a minimum. I've removed the specialized empty class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2020469763 From pminborg at openjdk.org Mon Mar 31 08:02:05 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 08:02:05 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v12] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 257 commits: - Remove link - Merge branch 'master' into implement-jep502 - Improve exception checking - Remove empty instances - Remove snippet for orElseSet - Add partial equality test - Update src/java.base/share/classes/java/lang/StableValue.java Co-authored-by: Paul Sandoz - Revamp toString() methods - Fix comments on doc issues - Create separate reentry prevention method and add tests - ... and 247 more: https://git.openjdk.org/jdk/compare/59629f88...7fe2c363 ------------- Changes: https://git.openjdk.org/jdk/pull/23972/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=11 Stats: 4000 lines in 30 files changed: 3969 ins; 18 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From pminborg at openjdk.org Mon Mar 31 08:05:25 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 08:05:25 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 00:44:39 GMT, Chen Liang wrote: >> Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: >> >> - Merge branch 'master' into implement-jep502 >> - Clean up exception messages and fix comments >> - Rename field >> - Rename method and fix comment >> - Rework reenterant logic >> - Use acquire semantics for reading rather than volatile semantics >> - Add missing null check >> - Simplify handling of sentinel, wrap, and unwrap >> - Fix JavaDoc issues >> - Fix members in StableEnumFunction >> - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f > > src/java.base/share/classes/jdk/internal/lang/stable/StableValueFactories.java line 71: > >> 69: } >> 70: >> 71: public static Map> map(Set keys) { > > I recommend choosing a different name from `map(Set, Function)` for navigation simplicitiy. Can you expand on the comment? Why would another name simplify navigation? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2020570880 From pminborg at openjdk.org Mon Mar 31 08:30:24 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 08:30:24 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 01:58:55 GMT, Luca Kellermann wrote: > > I see that, probably due to prior `java.util` contracts, a stable list or map cannot present a `toString` with unset component values. A stable list or map uses a ?canned? `toString` method that calls `get`, which must force all component values to be evaluated before the `toString` can be printed. > > I also noticed this issue of `toString` eagerly setting all elements of stable collections and agree that it probably shouldn't do this. Note that all views of these collections (obtained via `List.subList`, `List.reversed`, `Map.entrySet`, `Map.values`, etc.) would also need their own `toString` implementation. > > > Just as `WeakHashMap` bends the `Map` API (regarding `equals`), I think `StableValue` composites should bend the `List` and `Map` APIs, regarding `toString`. Sometimes the contracts have to be bent for the whole design to fit together. > > Neither `List`, `Set`, nor `Map` mention any requirements for `toString` in their interface specification. Only `AbstractCollection` and `AbstractMap` have a default implementation of `toString`. So I don't think any contract would be bent here. I think requiring the `toString()` method of `StableList::subList` and `StableList::reversed` is a bit too much, at least in the first incarnation. However, I have noted this as a candidate in future implementations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2765490546 From shade at openjdk.org Mon Mar 31 09:01:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 31 Mar 2025 09:01:07 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov wrote: > Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. > > It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. > > PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. > > Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. > > Testing: hs-tier1 - hs-tier4, microbenchmarks make/modules/jdk.incubator.vector/Lib.gmk line 85: > 83: DISABLED_WARNINGS_gcc := unused-function sign-compare tautological-compare ignored-qualifiers, \ > 84: DISABLED_WARNINGS_clang := unused-function sign-compare tautological-compare ignored-qualifiers, \ > 85: CFLAGS := $(NEON_CFLAGS), \ Is this supposed to match configs for linux-aarch64? I see we add `NEON_CFLAGS` here, and do _not_ add `vector_math_sve.c_CFLAGS` here. I would have thought those two are applicable to macos-aarch64 as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24306#discussion_r2020647457 From pminborg at openjdk.org Mon Mar 31 09:09:19 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 09:09:19 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v13] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Improve StableMapEntrySet::toString ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/7fe2c363..2d5bc500 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=11-12 Stats: 65 lines in 3 files changed: 65 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From pminborg at openjdk.org Mon Mar 31 09:27:28 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 09:27:28 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: <-UbQwk0aw0-qVSP6_YAXpXfORucLfNgWu_bOwfB7hI8=.6e38e9b8-ac80-4bfd-bf54-67beaf4fc9b1@github.com> On Sun, 16 Mar 2025 19:06:36 GMT, Luca Kellermann wrote: >> Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: >> >> - Merge branch 'master' into implement-jep502 >> - Clean up exception messages and fix comments >> - Rename field >> - Rename method and fix comment >> - Rework reenterant logic >> - Use acquire semantics for reading rather than volatile semantics >> - Add missing null check >> - Simplify handling of sentinel, wrap, and unwrap >> - Fix JavaDoc issues >> - Fix members in StableEnumFunction >> - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f > > src/java.base/share/classes/java/util/ImmutableCollections.java line 1488: > >> 1486: final K k = (K) key; >> 1487: return stable.orElseSet(new Supplier() { >> 1488: @Override public V get() { return mapper.apply(k); }}); > > This can return `null` (`StableMap` does allow `null` values), so the `getOrDefault` implementation in `AbstractImmutableMap` does not properly work for `StableMap`: > > var map = StableValue.map(Set.of(1), _ -> null); > // should print "null", but prints "default value" > System.out.println(map.getOrDefault(1, "default value")); Thanks for identifying this issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2020689451 From pminborg at openjdk.org Mon Mar 31 09:31:45 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 09:31:45 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v14] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Fix issue with StableMap and null values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/2d5bc500..5bdb5584 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=12-13 Stats: 20 lines in 2 files changed: 19 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From pminborg at openjdk.org Mon Mar 31 09:45:57 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 09:45:57 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v15] In-Reply-To: References: Message-ID: <6x3GigD57a6K4np6wzGcsvjGY1cQy88ip1ZBONGqNj0=.7d76e132-4a3c-4fe5-b883-dbccccdfc3c5@github.com> > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Add test and comments about null keys ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/5bdb5584..8c0ea1ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=13-14 Stats: 10 lines in 2 files changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From pminborg at openjdk.org Mon Mar 31 09:46:00 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 09:46:00 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Sun, 16 Mar 2025 18:43:26 GMT, Luca Kellermann wrote: >> Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 246 commits: >> >> - Merge branch 'master' into implement-jep502 >> - Clean up exception messages and fix comments >> - Rename field >> - Rename method and fix comment >> - Rework reenterant logic >> - Use acquire semantics for reading rather than volatile semantics >> - Add missing null check >> - Simplify handling of sentinel, wrap, and unwrap >> - Fix JavaDoc issues >> - Fix members in StableEnumFunction >> - ... and 236 more: https://git.openjdk.org/jdk/compare/4e51a8c9...d6e1573f > > src/java.base/share/classes/jdk/internal/lang/stable/StableValueFactories.java line 77: > >> 75: int i = 0; >> 76: for (K key : keys) { >> 77: entries[i++] = Map.entry(key, StableValueImpl.newInstance()); > > `Map.entry` causes `null` keys to throw a `NullPointerException`, meaning there can't be stable functions/maps with a `null` input/key. They can however have `null` values. Is that intended? Yes. The keys need to be non-null. I have added info about this in the docs now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2020715235 From shade at openjdk.org Mon Mar 31 09:49:08 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 31 Mar 2025 09:49:08 GMT Subject: RFR: 8351157: Clean up x86 GC barriers after 32-bit x86 removal [v2] In-Reply-To: <6aXRsWRRGrrJdkmNcZHPw8JBD5piGr6UrmjOdnHjlMY=.3dde2c28-bdfc-4eb1-8d1d-7a4c85d3234f@github.com> References: <6aXRsWRRGrrJdkmNcZHPw8JBD5piGr6UrmjOdnHjlMY=.3dde2c28-bdfc-4eb1-8d1d-7a4c85d3234f@github.com> Message-ID: On Thu, 27 Mar 2025 12:31:21 GMT, Aleksey Shipilev wrote: >> Assembler GC barriers have quite a bit of coding to support 32-bit x86. As 32-bit x86 is removed, we can clean up those parts. >> >> We can eliminate `!LP64` blocks quite easily. We can also prune passing around `thread` argument, and just trust that `r15_thread` is always available. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into JDK-8351157-x86-gc-barriers > - Also do tlab_allocate > - Rely on R15 to be a thread register > - Work @tschatzl, @earthling-amzn -- want to look at G1 and Shenandoah parts, respectively? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24253#issuecomment-2765712586 From pminborg at openjdk.org Mon Mar 31 09:54:01 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 09:54:01 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v16] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Fix issue with wrapped exception ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/8c0ea1ab..94b835f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=14-15 Stats: 13 lines in 2 files changed: 7 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From bkilambi at openjdk.org Mon Mar 31 09:54:14 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 31 Mar 2025 09:54:14 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations [v2] In-Reply-To: <8QDbenZGakijqUrwAcaVogoJBEiNpzYhN3sDrrteSDk=.d8539631-ab03-45ff-a762-0b6e14c63f89@github.com> References: <8QDbenZGakijqUrwAcaVogoJBEiNpzYhN3sDrrteSDk=.d8539631-ab03-45ff-a762-0b6e14c63f89@github.com> Message-ID: <5_o8l6NUDH-laA-OZT9wvJ5-AR9vs2tUwXf0jVzB9T4=.0ec06331-95ca-45a2-bd1f-14cea2150b81@github.com> On Tue, 25 Feb 2025 19:45:31 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend for scalar FP16 operations namely - add, subtract, multiply, divide, fma, sqrt, min and max. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Hello, I would not be able to respond to comments until the next couple months or so due to some urgent tasks at work. Until then, I'd move this PR to draft status so that it would not be closed due to lack of activity. Thank you for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23748#issuecomment-2765729618 From pminborg at openjdk.org Mon Mar 31 09:57:34 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 09:57:34 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v17] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Fix issue in StableIntFunction related to wrapped exceptions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/94b835f2..fe021b5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=15-16 Stats: 6 lines in 1 file changed: 3 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From ihse at openjdk.org Mon Mar 31 10:05:19 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 31 Mar 2025 10:05:19 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v6] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Fri, 28 Mar 2025 22:24:40 GMT, Doug Simon wrote: >> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> >> By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. >> >> The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. >> >> I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. >> >> When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: >> >> java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: >> >> java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci >> >> at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) >> Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: >> >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp >> /Users/dnsimo... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > convert Windows path to Unix path Hm... I know the source code is bundled with the test image, but I'm not 100% sure if it just includes `src`, or if the entire top-level source is included. I'll need to check that, including what is the best way to get a proper reference to the top-level directory from a test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24247#issuecomment-2765754142 From shade at openjdk.org Mon Mar 31 10:33:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 31 Mar 2025 10:33:30 GMT Subject: Integrated: 8352415: x86: Tighten up template interpreter method entry code In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 13:44:40 GMT, Aleksey Shipilev wrote: > Interpreter performance is the still important for faster startup, since it would carry application until compilers kick in. After looking at Leyden scenarios in Xint mode, I believe incremental improvements are possible in template interpreter to make it faster. > > One of those improvements is tightening up method entry code. Profiling shows the hottest path in the whole ordeal for non-native methods is resolving the Java mirror to store the GC root for currently executing Method*. It involves 4-5 chained memory accesses, which incurs significant latency. > > We can massage the code to reuse some memory accesses and also spread them out to allow more latency-hiding hardware mechanisms to kick in. > > Additional testing: > - [x] Ad-hoc `-Xint` benchmarks > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: 22f630cb Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/22f630cb20b4e846f63cf5799cd2c50437d4dcad Stats: 27 lines in 1 file changed: 11 ins; 1 del; 15 mod 8352415: x86: Tighten up template interpreter method entry code Reviewed-by: adinn, jsjolen ------------- PR: https://git.openjdk.org/jdk/pull/24114 From shade at openjdk.org Mon Mar 31 10:33:29 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 31 Mar 2025 10:33:29 GMT Subject: RFR: 8352415: x86: Tighten up template interpreter method entry code In-Reply-To: References: Message-ID: <4Y6_aAO-fLaBKzKBDoPZmbJDTkkm9qpehlNSeB_85sg=.f6491ec9-6ede-4331-bc36-92bba054357a@github.com> On Wed, 19 Mar 2025 13:44:40 GMT, Aleksey Shipilev wrote: > Interpreter performance is the still important for faster startup, since it would carry application until compilers kick in. After looking at Leyden scenarios in Xint mode, I believe incremental improvements are possible in template interpreter to make it faster. > > One of those improvements is tightening up method entry code. Profiling shows the hottest path in the whole ordeal for non-native methods is resolving the Java mirror to store the GC root for currently executing Method*. It involves 4-5 chained memory accesses, which incurs significant latency. > > We can massage the code to reuse some memory accesses and also spread them out to allow more latency-hiding hardware mechanisms to kick in. > > Additional testing: > - [x] Ad-hoc `-Xint` benchmarks > - [x] Linux x86_64 server fastdebug, `all` There we go! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24114#issuecomment-2765812243 From rehn at openjdk.org Mon Mar 31 10:45:54 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 31 Mar 2025 10:45:54 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user [v3] In-Reply-To: References: Message-ID: <5sujqD7L_cmLUyDwYb4PhgOlEeiFwlkAV7RJoVMFTrM=.223437cd-bbb2-4ef3-a6fe-b13ce402e14b@github.com> > Hi, for you to consider. > > These tests constantly fails in qemu-user. > Either the require host to be same arch or they are very very slow in emulation. > E.g. "ptrace(PTRACE_ATTACH, ..) failed for 405157: Function not implemented'" for SA tests. > This is the initial set of tests, there are many more, but I need to do some more verification for those. > > From bug: >> qemu-user/rv64 sets uarch to "qemu" in /proc/cpuinfo (qemu-system do not do that). >> We add this uarch to CPU feature string. >> This means we can use jtreg 'require' with cpu string to filter out tests in qemu-user. > > Relevant qemu code: > https://github.com/qemu/qemu/blob/170825d14d88a1ce7fae98d5a928480f2f329b22/linux-user/riscv/target_proc.h#L29 > > Relevant hotspot code: > https://github.com/openjdk/jdk/blob/fa0b18bfde38ee2ffbab33a9eaac547fe8aa3c7c/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp#L250 > > Tested that the require only filters out tests in qemu+riscv64. > > Thanks! > > /Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into qemu-user-issues - Revert - Merge branch 'master' into qemu-user-issues - Merge branch 'master' into qemu-user-issues - more - more - native or very long ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24229/files - new: https://git.openjdk.org/jdk/pull/24229/files/965424ac..73968ab8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24229&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24229&range=01-02 Stats: 9932 lines in 248 files changed: 5027 ins; 4362 del; 543 mod Patch: https://git.openjdk.org/jdk/pull/24229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24229/head:pull/24229 PR: https://git.openjdk.org/jdk/pull/24229 From asmehra at openjdk.org Mon Mar 31 10:57:17 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 31 Mar 2025 10:57:17 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 17:45:26 GMT, Thomas Fitzsimmons wrote: >> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. >> >> I tested it with: >> >> >> java -Xlog:os+container=trace -version >> >> on: >> >> `Red Hat Enterprise Linux 8 (cgroups v1 only)`: >> _No change in behaviour_ >> >> `Fedora 41 (cgroups v2)`: >> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ >> >> --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 >> +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 >> @@ -1,7 +1,12 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> +[debug][os,container] v2 controller cpuset is enabled and relevant >> +[debug][os,container] v2 controller cpu is enabled and required >> +[debug][os,container] v2 controller io is enabled but not relevant >> +[debug][os,container] v2 controller memory is enabled and required >> +[debug][os,container] v2 controller hugetlb is enabled but not relevant >> +[debug][os,container] v2 controller pids is enabled and relevant >> +[debug][os,container] v2 controller rdma is enabled but not relevant >> +[debug][os,container] v2 controller misc is enabled but not relevant >> [debug][os,container] Detected cgroups v2 unified hierarchy >> [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope >> [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max >> >> >> `Fedora 41 (custom kernel with cgroups v1 disabled)`: >> _Fixes `cgroups v2` detection:_ >> >> --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 >> +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 >> @@ -1,7 +1,63 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> -[debug][os,container] controller memory is not enabled >> - ] >> -[debug][os,container] One or more required controllers disabled at kernel level. >> +[... > > Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 > - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent > - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing > > Remove from cgroups v1 branch incorrect log messages about cpuset > controller being optional. Add test case for cgroups v1, cpuset > disabled. > - Improve !cgroups_v2_enabled branch comment > - Debug-log optional and disabled cgroups v2 controllers > > Do not log enabled controllers that are not relevant to the JDK. > - Move index declaration to scope in which it is used > - Remove empty string check during cgroup.controllers parsing > - Define ISSPACE_CHARS macro, use it in strsep call > - Pass fgets result to strsep > - Replace is_cgroupsV2 with cgroups_v2_enabled > > Also fix the testCgroupv1SystemdOnly and testCgroupv1NoMounts test > cases such that their /proc/cgroups and /proc/self/cgroup contents > correspond. This prevents assertion failures these tests were > producing when is_cgroupsV2 was replaced with cgroups_v2_enabled. > - ... and 3 more: https://git.openjdk.org/jdk/compare/b7bb1ee2...b6926e15 src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 81: > 79: // file system magic. If it does not then heuristics are required to determine > 80: // if cgroups v1 is usable or not. > 81: if (statfs(sys_fs_cgroup, &fsstat) != -1) { I feel this logic should be moved to `determine_type` as it is responsible for determining the version of the cgroup subsystem. test/hotspot/jtreg/containers/cgroup/CgroupSubsystemFactory.java line 459: > 457: public void testCgroupv1SystemdOnly(WhiteBox wb) { > 458: String procCgroups = cgroupv1CgInfoZeroHierarchy.toString(); > 459: String procSelfCgroup = cgroupV2SelfCgroup.toString(); I don't get why is this change required? The test name `testCgroupv1SystemdOnly` suggests it is testing cgroup v1 only but then it passes cgroup v2 proc file. Same for `testCgroupv1NoMounts`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r2020822189 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r2020822311 From duke at openjdk.org Mon Mar 31 11:14:14 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 31 Mar 2025 11:14:14 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 02:38:37 GMT, Jatin Bhateja wrote: >> Ferenc Rakoczi has updated the pull request incrementally with two additional commits since the last revision: >> >> - Further readability improvements. >> - Added asserts for array sizes > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1252: > >> 1250: // Currently we only have them for AVX512 >> 1251: #ifdef _LP64 >> 1252: if (supports_evex() && supports_avx512bw()) { > > supports_evex check looks redundant. These are checks for two different feature bits: CPU_AVX512F and CPU_AVX512BW. Are you saying that the latter implies the former in every implementation of the spec? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2020853815 From pminborg at openjdk.org Mon Mar 31 12:00:37 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 12:00:37 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v18] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with two additional commits since the last revision: - Remove VM optimizations for StableValue fields - Rename factory method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/fe021b5c..09122d41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=16-17 Stats: 121 lines in 8 files changed: 18 ins; 93 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From pminborg at openjdk.org Mon Mar 31 12:06:11 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 12:06:11 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v19] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Revert changes in s.m.Unsafe ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/09122d41..6fd56533 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=17-18 Stats: 13 lines in 1 file changed: 0 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From ihse at openjdk.org Mon Mar 31 12:10:14 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 31 Mar 2025 12:10:14 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 17:41:29 GMT, Julian Waters wrote: >> Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. >> >> It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. >> >> PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. >> >> Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. >> >> Testing: hs-tier1 - hs-tier4, microbenchmarks > > make/modules/jdk.incubator.vector/Lib.gmk line 83: > >> 81: SRC := libsleef/lib, \ >> 82: EXTRA_SRC := libsleef/generated, \ >> 83: DISABLED_WARNINGS_gcc := unused-function sign-compare tautological-compare ignored-qualifiers, \ > > DISABLED_WARNINGS_gcc is technically not needed, gcc is not a supported compiler on macOS, at least, not yet... > > If you feel that gcc support for macOS is a worthy addition to have to make the lives of future compiler porters, you can leave it in there. It's otherwise up to you whether you want to remove it or not (I personally like the idea of being able to compile for macOS with gcc in the future however) No, we should not have dead code "just in case". If someone were to support gcc on macos (seems implausible if that could ever be possible), then this is the least of their troubles. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24306#discussion_r2020917777 From ihse at openjdk.org Mon Mar 31 12:17:07 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 31 Mar 2025 12:17:07 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov wrote: > Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. > > It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. > > PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. > > Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. > > Testing: hs-tier1 - hs-tier4, microbenchmarks The build code seems like it should be the same as for linux/aarch64. In fact, at this point, the code duplication should be removed into a single setup call. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2766037908 From ihse at openjdk.org Mon Mar 31 12:17:08 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 31 Mar 2025 12:17:08 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 17:46:43 GMT, Julian Waters wrote: > Is leaving the sources of sleef in share/native the right thing to do? No, it should move to the least common directory for all platforms where it is needed. In this case, it should move to `unix` instead of `share`. But code that were to be used in e.g. windows and macos but not linux, should be put in `share`, even if it is not used on *all* platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2766044354 From ihse at openjdk.org Mon Mar 31 12:17:09 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 31 Mar 2025 12:17:09 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 08:58:20 GMT, Aleksey Shipilev wrote: >> Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. >> >> It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. >> >> PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. >> >> Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. >> >> Testing: hs-tier1 - hs-tier4, microbenchmarks > > make/modules/jdk.incubator.vector/Lib.gmk line 85: > >> 83: DISABLED_WARNINGS_gcc := unused-function sign-compare tautological-compare ignored-qualifiers, \ >> 84: DISABLED_WARNINGS_clang := unused-function sign-compare tautological-compare ignored-qualifiers, \ >> 85: CFLAGS := $(NEON_CFLAGS), \ > > Is this supposed to match configs for linux-aarch64? I see we add `NEON_CFLAGS` here, and do _not_ add `vector_math_sve.c_CFLAGS` here. I would have thought those two are applicable to macos-aarch64 as well? This seems to be resurrected from some very old code. We don't have any `NEON_CFLAGS` anymore. This makes me wonder: @iwanowww what kind of testing have you done to ensure this works correctly? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24306#discussion_r2020922588 From pminborg at openjdk.org Mon Mar 31 12:18:07 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 12:18:07 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v20] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Remove StableValueFactories ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/6fd56533..f9521793 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=18-19 Stats: 156 lines in 12 files changed: 37 ins; 86 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From stefank at openjdk.org Mon Mar 31 12:22:18 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 31 Mar 2025 12:22:18 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: <3zkHWLVEELkQkeSU9M0YAOpb3olMDNyU1HAdWUJEm68=.a2d2f9ea-c635-4379-95d7-00ff358eb15f@github.com> <5wBQqxybptneJjhR5usfrqg3PJ7G2PB_sDjUkb4BObM=.fe04a403-64ad-4dc5-b793-b48da01acfd4@github.com> Message-ID: On Fri, 28 Mar 2025 09:44:00 GMT, Thomas Stuefe wrote: > > > > > > > I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today. > > > > > > > > > > > > > > > > > > I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions. > > > > > > > > > > > > > > > I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error? > > > > > > > > > > > > [`VirtualSpace::shrink_by`](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/memory/virtualspace.cpp#L373) allows uncommit to fail without crashing. I'm not certain of the intention behind that. But it seems like it's because shrinking is an optimization and not always critical that it be done immediately. [[1](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/gc/serial/tenuredGeneration.cpp#L258)] > > > > > > > > > The above example shows code that assumes that it is OK to fail uncommitting and continuing. I'm trying to figure it that assumption is true. So, what I meant was that I was looking for a concrete example of a failure mode of uncommit that would be an acceptable (safe) failure to continue executing from. That is, a valid failure that don't mess up the memory in an unpredictable/unknowable way. > > > > > > So release/uncommit (via mmap,munmap, VirtualFree) could fail due to: ? Bad arguments, or ? The OS encountered an issue out of control of the JVM. > > ? JVM bug. Reasonable to fatally fail here. Or the caller could be intentionally passing arguments that may or may not be valid. I don't think there is any code like that currently. > > ? The only errors that aren't due to bad arugments are ENOMEM and ones related to file descriptors (which are not applicable to uncommit). VirtualFree only fails due to bad arguments according to windows docs. > > So if there's consensus that ENOMEM is not recoverable (or rare enough to not worry about), then it seems like its OK to fatally fail in all scenarios. > > +1 > > Thanks for investigating the details of this (also nothing we couldn't change later if it bugs us). +1 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2766054807 From ihse at openjdk.org Mon Mar 31 12:31:07 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 31 Mar 2025 12:31:07 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov wrote: > Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. > > It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. > > PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. > > Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. > > Testing: hs-tier1 - hs-tier4, microbenchmarks Instead of trying to guide you how to fix this, I made the unification of all libsleef stanzas myself. It is available here: https://github.com/openjdk/jdk/commit/04feadda561b2f7a6afff440ab5b4e188361c048 That commit assumes that `vector_math_sve.c` should have `$(SVE_CFLAGS)` on mac as well as on linux. If that is not correct, then it needs to be adjusted. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2766072866 From dlunden at openjdk.org Mon Mar 31 12:31:47 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 31 Mar 2025 12:31:47 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal Message-ID: After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes. ### Changeset Changes: - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case. - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN. - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure. ### Testing - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489) - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. - Performance testing - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions. - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions. ### Additional issue investigation For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in itself. In the below, I consider and refer to three versions of the JVM: - Before the fix for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) ("baseline") - After the fix for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) ("target-old") - After the fix in this PR ("target") First, consider the number of live nodes and the IGVN worklist size during the particular IGVN run that resulted in this issue. Orange is the number of live nodes and blue the worklist size, and I increased `MaxNodeLimit` to ensure IGVN runs to completion. ![igvn-worklist-nodes-count-plot](https://github.com/user-attachments/assets/cb56a85b-5612-4cb5-b49f-cf3a6a07a1f7) Note that, for "baseline", both the live nodes and worklist size decrease more or less monotonically and the IGVN run finishes quickly. For both "target-old" and "target", the additional Phi idealizations result in dramatic local increases of live nodes, and the IGVN runs have an order of magnitude more iterations compared to "baseline". However, after the IGVN runs, the number of nodes is still reasonable. To investigate if the local live node increases are an issue in general, I ran the DaCapo 23 benchmark suite and collected three statistics for each IGVN run: - max live nodes, - max live node increment (in a single IGVN step), and - max IGVN worklist size. Below are the results, shown with violin plots. The endpoints are the maximum values and the midpoints are the arithmetic means. The shaded areas approximate the distribution of the statistics (over all IGVN runs). ![dacapo-global](https://github.com/user-attachments/assets/a4c6570e-aed7-450a-85bc-90dcaf38a2ea) In particular, note the sharp difference between "target-old" and the other two versions for "Max live node increment". This changeset addresses this difference (i.e., the difference between "target-old" and "target"). Overall, the distributions and extrema for "target" and "baseline" are comparable. **This motivates that the dramatic live node increase seen in this issue is an edge case and is not a problem in general.** Additionally, there were 0 IGVN node count bailouts across all versions for DaCapo. I also ran Renaissance, SPECjvm, and SPECjbb to check if there were any differences in bailout counts (total, not only IGVN node count bailouts): - "baseline": 252 bailouts - "target-old": 255 bailouts - "target": 253 bailouts It could be worth investigating the 1-bailout difference between "baseline" and "target", but the difference could also likely be attributed to noise. ------------- Commit messages: - First version Changes: https://git.openjdk.org/jdk/pull/24325/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24325&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351833 Stats: 93 lines in 4 files changed: 87 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24325/head:pull/24325 PR: https://git.openjdk.org/jdk/pull/24325 From coleenp at openjdk.org Mon Mar 31 12:48:19 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 31 Mar 2025 12:48:19 GMT Subject: RFR: 8352415: x86: Tighten up template interpreter method entry code In-Reply-To: References: Message-ID: <_L36ry3SVQpPCNQ6qjEUtbcdiSu1w-6zKZ-TYCGSkZQ=.939ca626-996e-4099-a1c2-eb1f1795396f@github.com> On Wed, 19 Mar 2025 13:44:40 GMT, Aleksey Shipilev wrote: > Interpreter performance is the still important for faster startup, since it would carry application until compilers kick in. After looking at Leyden scenarios in Xint mode, I believe incremental improvements are possible in template interpreter to make it faster. > > One of those improvements is tightening up method entry code. Profiling shows the hottest path in the whole ordeal for non-native methods is resolving the Java mirror to store the GC root for currently executing Method*. It involves 4-5 chained memory accesses, which incurs significant latency. > > We can massage the code to reuse some memory accesses and also spread them out to allow more latency-hiding hardware mechanisms to kick in. > > Additional testing: > - [x] Ad-hoc `-Xint` benchmarks > - [x] Linux x86_64 server fastdebug, `all` Sorry I was away and didn't see this. For the record, it looks good. I liked the load_mirror macro assembler function but it did redundant work. ------------- PR Review: https://git.openjdk.org/jdk/pull/24114#pullrequestreview-2729165704 From jwaters at openjdk.org Mon Mar 31 13:12:16 2025 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 31 Mar 2025 13:12:16 GMT Subject: RFR: 8345265: Minor improvements for LTO across all compilers [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 05:10:33 GMT, Julian Waters wrote: > > Wait, sorry to trouble you further, but what does nm --demangle --reverse-sort --print-size --size-sort libjvm.so on HotSpot compiled by gcc 14 with LTO active yield as the largest symbol in the binary? (It should be the symbol listed at the very top) > > This is my output; maybe I have to add that I used the 'normal' jdk head without patches, is that what I should do for a gcc14 build test? > > ``` > nm --demangle --reverse-sort --print-size --size-sort images/jdk/lib/server/libjvm.so | more > 0000000000453ee0 000000000002320e t State::MachNodeGenerator(int) > 0000000000970f70 0000000000018eb9 t CompilerToVM::initialize_intrinsics(JVMCIEnv*) > 000000000140caa0 000000000000f018 b Matcher::mreg2regmask > 0000000000993c80 000000000000a40d t JNIJVMCI::initialize_ids(JNIEnv_*) > 0000000000ac16b0 0000000000009d16 t Matcher::Fixup_Save_On_Entry() > 000000000143db00 0000000000008000 b _ZL9_elements.lto_priv.0 > 0000000001446d20 0000000000008000 b _free_list > 000000000141e900 0000000000007d00 b DFSClosure::_reference_stack > 00000000013d6d40 0000000000007668 d _ZL9flagTable.lto_priv.0 > 00000000013edf60 0000000000006c30 d VMStructs::localHotSpotVMStructs > 00000000010463f0 0000000000006a06 t readConfiguration0(JNIEnv_*, JVMCIEnv*) [clone .isra.0] > 0000000000d51dc0 00000000000067a2 t StubGenerator::generate_libmPow() > 00000000010b12d0 0000000000006289 t G1ParScanThreadState::trim_queue_to_threshold(unsigned int) > 0000000000e24550 00000000000061e8 t ClassVerifier::verify_method(methodHandle const&, JavaThread*) > 0000000001076dd0 000000000000548d t State::DFA(int, Node const*) [clone .isra.0] > 0000000000e1ba00 000000000000519d t VMError::report(outputStream*, bool) > 00000000014521e0 0000000000005000 b TemplateInterpreter::_safept_table > 000000000142cb60 0000000000005000 b TemplateInterpreter::_normal_table > 0000000001431b60 0000000000005000 b TemplateInterpreter::_active_table > 0000000000653790 0000000000004e12 t CompileBroker::print_heapinfo(outputStream*, char const*, unsigned long) > 000000000075be40 0000000000004e0b t G1CollectedHeap::do_collection_pause_at_safepoint_helper() > 0000000000b90260 0000000000004a41 t Parse::do_one_bytecode() [clone .part.0] > 00000000010c1f20 00000000000049e6 t d_print_comp_inner > 0000000000c1e180 0000000000004594 t ServiceThread::service_thread_entry(JavaThread*, JavaThread*) > 00000000005ad7c0 0000000000004424 t C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) > 0000000000d260b0 000000000000440a t PhaseStringOpts::replace_string_concat(StringConcat*) > 0000000000e48920 00000000000042e5 t VM_Version::initialize() > 0000000000afbad0 0000000000004240 t Method::init_intrinsic_id(vmSymbolID) > 00000000010200d0 00000000000040ab t PSParallelCompact::invoke_no_policy(bool) [clone .isra.0] > 000000000051f230 0000000000004054 t Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) > 000000000065c060 0000000000004015 t Compile::Code_Gen() > 0000000000663060 0000000000003fec t CompileBroker::compiler_thread_loop() > 000000000070d9b0 0000000000003fd7 t ConnectionGraph::do_analysis(Compile*, PhaseIterGVN*) > 0000000000be8890 0000000000003f8e t PhaseChaitin::Split(unsigned int, ResourceArea*) > 0000000000818730 0000000000003f82 t PhaseChaitin::build_ifg_physical(ResourceArea*) > 0000000000fb1920 0000000000003f44 t SharedRuntime::generate_native_wrapper(MacroAssembler*, methodHandle const&, int, BasicType*, VMRegPair*, BasicType) [clone .constprop.0] > 00000000007e2ab0 0000000000003f41 t PhaseCFG::global_code_motion() > 00000000013e8580 0000000000003e10 d JVMCIVMStructs::localHotSpotVMStructs > 0000000000db3b00 0000000000003dff t TemplateInterpreterGenerator::generate_all() > 0000000000f436d0 0000000000003d92 t initialize_stubs(StubGenBlobId, int, int, char const*, char const*, char const*) [clone .constprop.0] > 0000000000d6b680 0000000000003d42 t StubGenerator::generate_libmTan() > 00000000004ff560 0000000000003d28 t BCEscapeAnalyzer::iterate_blocks(Arena*) > 0000000000a2bfb0 0000000000003cef t VM_RedefineClasses::load_new_class_versions() [clone .part.0] > 000000000073fde0 0000000000003ca2 t G1CollectedHeap::do_full_collection(bool, bool) > 0000000000e8e170 0000000000003bea t ZDriverMajor::run_thread() > 000000000103ec00 0000000000003b81 t JvmtiEnv::RetransformClasses(int, _jclass* const*) [clone .isra.0] > 0000000000d18c50 0000000000003b4d t StubGenerator::generate_md5_implCompress(StubGenStubId) > 0000000000a97160 0000000000003b20 t PhaseIdealLoop::auto_vectorize(IdealLoopTree*, VSharedData&) [clone .part.0] > 00000000013df000 0000000000003aa8 d ruleName > 00000000005c7e90 0000000000003a9f t PhiNode::Ideal(PhaseGVN*, bool) > 00000000006a1ef0 0000000000003a9f t State::_sub_Op_AddP(Node const*) > 0000000000e86880 0000000000003a74 t ZGeneration::select_relocation_set(ZGenerationId, bool) > --More-- > ``` I've spoken to the gcc maintainers, according to them problems with -fuse-linker-plugin can be a potential cause of the fiasco with the inlining here. Sorry to trouble you further, but what happens if you replace both instances of -fuse-linker-plugin with -fno-use-linker-plugin on Linux in JvmFeatures.gmk? Does any change occur in the nm output or is the resulting JVM massively bloated? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2766182048 From pminborg at openjdk.org Mon Mar 31 13:32:56 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 13:32:56 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v21] In-Reply-To: References: Message-ID: <5BpPdjiai-EEyaut13cxuVnuow-AQUMnB7mBHd7on5Q=.70a81283-6779-45b4-9fc9-e6b911f42b2a@github.com> > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Finish and clean up benchmarks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/f9521793..7fb8cb41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=19-20 Stats: 222 lines in 6 files changed: 193 ins; 16 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From pminborg at openjdk.org Mon Mar 31 13:51:29 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 13:51:29 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v21] In-Reply-To: <5BpPdjiai-EEyaut13cxuVnuow-AQUMnB7mBHd7on5Q=.70a81283-6779-45b4-9fc9-e6b911f42b2a@github.com> References: <5BpPdjiai-EEyaut13cxuVnuow-AQUMnB7mBHd7on5Q=.70a81283-6779-45b4-9fc9-e6b911f42b2a@github.com> Message-ID: On Mon, 31 Mar 2025 13:32:56 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Finish and clean up benchmarks Here are the latest benchmarks run on an M1 (macOS): Benchmark Mode Cnt Score Error Units StableFunctionBenchmark.function avgt 10 4.228 ? 0.172 ns/op StableFunctionBenchmark.map avgt 10 4.323 ? 0.289 ns/op StableFunctionBenchmark.staticIntFunction avgt 10 1.724 ? 0.121 ns/op StableFunctionBenchmark.staticSMap avgt 10 1.710 ? 0.045 ns/op StableFunctionSingleBenchmark.function avgt 10 4.329 ? 0.184 ns/op StableFunctionSingleBenchmark.map avgt 10 4.291 ? 0.142 ns/op StableFunctionSingleBenchmark.staticIntFunction avgt 10 0.704 ? 0.022 ns/op StableFunctionSingleBenchmark.staticSMap avgt 10 0.708 ? 0.027 ns/op StableIntFunctionBenchmark.intFunction avgt 10 1.558 ? 0.063 ns/op StableIntFunctionBenchmark.list avgt 10 1.579 ? 0.141 ns/op StableIntFunctionBenchmark.staticIntFunction avgt 10 1.044 ? 0.031 ns/op StableIntFunctionBenchmark.staticList avgt 10 2.280 ? 2.013 ns/op StableIntFunctionSingleBenchmark.intFunction avgt 10 2.333 ? 0.033 ns/op StableIntFunctionSingleBenchmark.list avgt 10 2.335 ? 0.046 ns/op StableIntFunctionSingleBenchmark.staticIntFunction avgt 10 0.670 ? 0.022 ns/op StableIntFunctionSingleBenchmark.staticList avgt 10 0.679 ? 0.021 ns/op StableSupplierBenchmark.stable avgt 10 1.377 ? 0.042 ns/op StableSupplierBenchmark.staticStable avgt 10 0.362 ? 0.077 ns/op StableSupplierBenchmark.staticSupplier avgt 10 0.338 ? 0.016 ns/op StableSupplierBenchmark.supplier avgt 10 1.609 ? 0.042 ns/op StableValueBenchmark.atomic avgt 10 1.357 ? 0.046 ns/op StableValueBenchmark.dcl avgt 10 1.369 ? 0.058 ns/op StableValueBenchmark.refSupplier avgt 10 0.442 ? 0.007 ns/op StableValueBenchmark.stable avgt 10 1.522 ? 0.267 ns/op StableValueBenchmark.stableNull avgt 10 1.237 ? 0.117 ns/op StableValueBenchmark.staticAtomic avgt 10 1.220 ? 0.058 ns/op StableValueBenchmark.staticDcl avgt 10 0.357 ? 0.022 ns/op StableValueBenchmark.staticHolder avgt 10 1.452 ? 0.205 ns/op StableValueBenchmark.staticRecordHolder avgt 10 0.367 ? 0.028 ns/op StableValueBenchmark.staticStable avgt 10 0.365 ? 0.026 ns/op Finished running test 'micro:java.lang.stable' ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2766294452 From ihse at openjdk.org Mon Mar 31 13:51:49 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 31 Mar 2025 13:51:49 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 15:11:06 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > address Windows issues How problematic would it be to read it on demand? Is it just that there is a risk that it won't work, or could it cause the crash dumping process to fail completely? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2766305468 From mcimadamore at openjdk.org Mon Mar 31 14:07:26 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 31 Mar 2025 14:07:26 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v8] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 16:23:20 GMT, Per Minborg wrote: > I have rewritten all the `toString()` methods. A `StableList::toString` now produces something much more similar to a regular `List::toString`. The only difference is that the `StableList::toString` shows ".unset" for the elements that are not yet evaluated. In other words, `StableList::toString` no longer evaluates all the elements, but rather does a "high impedance" scan over them and if evaluated, invokes `toString` on the element, otherwise just shows ".unset" for that element. > > The same goes for `StableMap` and all the stable functions (which now share the same code path as the stable collections). > > `StableValue` itself does not add extra square brackets around its content. This seems fine -- I noticed that `toString` doesn't appear anywhere in the `Collection` API -- so there doesn't seem to be any contract for `List::toString` to behave in the way a list created with `List::of` does. I suppose I'm still not super convinced as to whether the fact that the list is backed by stable holders should be reflected in the `toString` or not -- after all, `toString` can be thought of as a method that depends on the element values, which would trigger materialization for said values. In other words, I'm not sure I get the precise use case where having such a method would be useful. E.g. if I'm debugging, do I care whether I'm triggering evaluation earlier than normal? Is the worry that, by performing such an early evaluation, the resulting value for the element might be different from the one that would be triggered during normal execution (e.g. because of some stateful behavior in the function that computes the elements of the stable function) ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2766348361 From duke at openjdk.org Mon Mar 31 14:12:29 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Mon, 31 Mar 2025 14:12:29 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 16:20:42 GMT, Robert Toyonaga wrote: > ### Summary: > This PR makes memory operations atomic with NMT accounting. > > ### The problem: > In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. > > 1.1 Thread_1 releases range_A. > 1.2 Thread_1 tells NMT "range_A has been released". > > 2.1 Thread_2 reserves (the now free) range_A. > 2.2 Thread_2 tells NMT "range_A is reserved". > > Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. > > ### Solution: > Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. > > ### Other notes: > I also simplified this pattern found in many places: > > if (MemTracker::enabled()) { > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_some_operation(addr, bytes); > if (result != nullptr) { > MemTracker::record_some_operation(addr, bytes); > } > } else { > result = pd_unmap_memory(addr, bytes); > } > ``` > To: > > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_unmap_memory(addr, bytes); > MemTracker::record_some_operation(addr, bytes); > ``` > This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. > > I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific implementation. > > In many places I've done minor refactoring by relocating call... OK should I update this PR to do the following things: - Add comments explaining the asymmetrical locking and warning against patterns that lead to races - swapping the order of `NmtVirtualMemoryLocker` and release/uncommit - Fail fatally if release/uncommit does not complete. Or does it make more sense to do that in a different issue/PR? Also, do we want to keep the new tests and the refactorings (see below)? if (MemTracker::enabled()) { MemTracker::NmtVirtualMemoryLocker nvml; result = pd_some_operation(addr, bytes); if (result != nullptr) { MemTracker::record_some_operation(addr, bytes); } } else { result = pd_unmap_memory(addr, bytes); } To: MemTracker::NmtVirtualMemoryLocker nvml; result = pd_unmap_memory(addr, bytes); MemTracker::record_some_operation(addr, bytes); ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2766365411 From mcimadamore at openjdk.org Mon Mar 31 14:25:40 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 31 Mar 2025 14:25:40 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v21] In-Reply-To: <5BpPdjiai-EEyaut13cxuVnuow-AQUMnB7mBHd7on5Q=.70a81283-6779-45b4-9fc9-e6b911f42b2a@github.com> References: <5BpPdjiai-EEyaut13cxuVnuow-AQUMnB7mBHd7on5Q=.70a81283-6779-45b4-9fc9-e6b911f42b2a@github.com> Message-ID: On Mon, 31 Mar 2025 13:32:56 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Finish and clean up benchmarks src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 161: > 159: > 160: static String renderWrapped(Object t) { > 161: return (t == null) ? UNSET_LABEL : Objects.toString(unwrap(t)); If I read correctly, this implementation is similar to what described here: https://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/ (see section `Using C++11 Acquire and Release Fences`). We don't need the "relaxed" loads in Java because a reference load in Java can never tear. Correct? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2021151169 From duke at openjdk.org Mon Mar 31 14:28:20 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 31 Mar 2025 14:28:20 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11] In-Reply-To: <_TOBoO4cMQpw4sgzIpNpQZ2w5wDgezKQZLe314DQ7zo=.813b81bf-ecc0-4f75-a0d6-fbb13dde594e@github.com> References: <_TOBoO4cMQpw4sgzIpNpQZ2w5wDgezKQZLe314DQ7zo=.813b81bf-ecc0-4f75-a0d6-fbb13dde594e@github.com> Message-ID: On Mon, 24 Mar 2025 15:16:20 GMT, Volodymyr Paprotski wrote: >> Ferenc Rakoczi has updated the pull request incrementally with two additional commits since the last revision: >> >> - Further readability improvements. >> - Added asserts for array sizes > > I still need to have a look at the sha3 changes, but I think I am done with the most complex part of the review. This was a really interesting bit of code to review! @vpaprotsk , thanks a lot for the very thorough review! > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 270: > >> 268: } >> 269: >> 270: static void loadPerm(int destinationRegs[], Register perms, > > `replXmm`? i.e. this function is replicating (any) Xmm register, not just perm?.. Since I am only using it for permutation describers, I thought this way it is easier to follow what is happening. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 327: > >> 325: // >> 326: // >> 327: static address generate_dilithiumAlmostNtt_avx512(StubGenerator *stubgen, > > Similar comments as to `generate_dilithiumAlmostInverseNtt_avx512` > > - similar comment about the 'pair-wise' operation, updating `[j]` and `[j+l]` at a time.. > - somehow had less trouble following the flow through registers here, perhaps I am getting used to it. FYI, ended renaming some as: > > // xmm16_27 = Temp1 > // xmm0_3 = Coeffs1 > // xmm4_7 = Coeffs2 > // xmm8_11 = Coeffs3 > // xmm12_15 = Coeffs4 = Temp2 > // xmm16_27 = Scratch For me, it was easier to follow what goes where using the xmm... names (with the symbolic names you always have to remember which one overlaps with another and how much). > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 421: > >> 419: for (int i = 0; i < 8; i += 2) { >> 420: __ evpermi2d(xmm(i / 2 + 12), xmm(i), xmm(i + 1), Assembler::AVX_512bit); >> 421: } > > Wish there was a more 'abstract' way to arrange this, so its obvious from the shape of the code what registers are input/outputs (i.e. and use the register arrays). Even though its just 'elementary index operations' `i/2 + 16` is still 'clever'. Couldnt think of anything myself though (same elsewhere in this function for the table permutes). Well, this is how it is when we have three inputs, one of which also plays as output... At least the output is always the first one (so that one gets clobbered). This is why you have to replicate the permutation describer when you need both permutands later. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 509: > >> 507: // coeffs (int[256]) = c_rarg0 >> 508: // zetas (int[256]) = c_rarg1 >> 509: static address generate_dilithiumAlmostInverseNtt_avx512(StubGenerator *stubgen, > > Done with this function; Perhaps the 'permute table' is a common vector-algorithm pattern, but this is really clever! > > Some general comments first, rest inline. > > - The array names for registers helped a lot. And so did the new helper functions! > - The java version of this code is quite intimidating to vectorize.. 3D loop, with geometric iteration variables.. and the literature is even more intimidating (discrete convolutions which I havent touched in two decades, ffts, ntts, etc.) Here is my attempt at a comment to 'un-scare' the next reader, though feel free to reword however you like. > > The core of the (Java) loop is this 'pair-wise' operation: > int a = coeffs[j]; > int b = coeffs[j + offset]; > coeffs[j] = (a + b); > coeffs[j + offset] = montMul(a - b, -MONT_ZETAS_FOR_NTT[m]); > > There are 8 'levels' (0-7); ('levels' are equivalent to (unrolling) the outer (Java) loop) > At each level, the 'pair-wise-offset' doubles (2^l: 1, 2, 4, 8, 16, 32, 64, 128). > > To vectorize this Java code, observe that at each level, REGARDLESS the offset, half the operations are the SUM, and the other half is the > montgomery MULTIPLICATION (of the pair-difference with a constant). At each level, one 'just' has to shuffle > the coefficients, so that SUMs and MULTIPLICATIONs line up accordingly. > > Otherwise, this pattern is 'lightly similar' to a discrete convolution (compute integral/summation of two functions at every offset) > > - I still would prefer (more) symbolic register names.. I wouldn't hold my approval over it so won't object if nobody else does, but register numbers are harder to 'see' through the flow. I ended up search/replacing/'annotating' to make it easier on myself to follow the flow of data: > > // xmm8_11 = Perms1 > // xmm12_15 = Perms2 > // xmm16_27 = Scratch > // xmm0_3 = CoeffsPlus > // xmm4_7 = CoeffsMul > // xmm24_27 = CoeffsMinus (overlaps with Scratch) > > (I made a similar comment, but I think it is now hidden after the last refactor) > - would prefer to see the helper functions to get ALL the registers passed explicitly (i.e. currently `montMulPerm`, `montQInvModR`, `dilithium_q`, `xmm29`, are implicit.). As a general rule, I've tried to set up all the registers up at the 'entry' function (`generate_dilithium*` in this case) and ... I added some more comments, but I kept the xmm... names for the registers, just like with the ntt function. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 554: > >> 552: for (int i = 0; i < 8; i += 2) { >> 553: __ evpermi2d(xmm(i / 2 + 8), xmm(i), xmm(i + 1), Assembler::AVX_512bit); >> 554: __ evpermi2d(xmm(i / 2 + 12), xmm(i), xmm(i + 1), Assembler::AVX_512bit); > > Took a bit to unscramble the flow, so a comment needed? Purpose 'fairly obvious' once I got the general shape of the level/algorithm (as per my top-level comment) but something like "shuffle xmm0-7 into xmm8-15"? I hope the comment that I added at the beginning of the function sheds some light on the purpose of these permutations. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 656: > >> 654: for (int i = 0; i < 8; i++) { >> 655: __ evpsubd(xmm(i), k0, xmm(i + 8), xmm(i), false, Assembler::AVX_512bit); >> 656: } > > Fairly clean as is, but could also be two sub_add calls, I think (you have to swap order of add/sub in the helper, to be able to clobber `xmm(i)`.. or swap register usage downstream, so perhaps not.. but would be cleaner) > > sub_add(CoeffsPlus, Scratch, Perms1, CoeffsPlus, _masm); > sub_add(CoeffsMul, &Scratch[4], Perms2, CoeffsMul, _masm); > > > If nothing else, would had prefered to see the use of the register array variables I would rather leave this alone, too. I was considering the same, but decided that this is fairly easy to follow, it would be more complicated to either add a new helper function or follow where there are overlaps in the symbolically named register sets. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 871: > >> 869: __ evpaddd(xmm5, k0, xmm1, barrettAddend, false, Assembler::AVX_512bit); >> 870: __ evpaddd(xmm6, k0, xmm2, barrettAddend, false, Assembler::AVX_512bit); >> 871: __ evpaddd(xmm7, k0, xmm3, barrettAddend, false, Assembler::AVX_512bit); > > Fairly 'straightforward' transcription of the java code.. no comments from me. > > At first glance using `xmm0_3`, `xmm4_7`, etc. might had been a good idea, but you only save one line per 4x group. (Unless you have one big loop, but I suspect that give you worse performance? Is that something you tried already? Might be worth it otherwise..) I have considered this but decided to leave it alone (for the reason that you mentioned). > src/java.base/share/classes/sun/security/provider/ML_DSA.java line 1418: > >> 1416: int twoGamma2, int multiplier) { >> 1417: assert (input.length == ML_DSA_N) && (lowPart.length == ML_DSA_N) >> 1418: && (highPart.length == ML_DSA_N); > > I wrote this test to test java-to-intrinsic correspondence. Might be good to include it (and add the other 4 intrinsics). This is very similar to all my other *Fuzz* tests I've been adding for my own intrinsics (and you made this test FAR easier to write by breaking out the java implementation; need to 'copy' that pattern myself) > > import java.util.Arrays; > import java.util.Random; > > import java.lang.invoke.MethodHandle; > import java.lang.invoke.MethodHandles; > import java.lang.reflect.Field; > import java.lang.reflect.Method; > import java.lang.reflect.Constructor; > > public class ML_DSA_Intrinsic_Test { > > public static void main(String[] args) throws Exception { > MethodHandles.Lookup lookup = MethodHandles.lookup(); > Class kClazz = Class.forName("sun.security.provider.ML_DSA"); > Constructor constructor = kClazz.getDeclaredConstructor( > int.class); > constructor.setAccessible(true); > > Method m = kClazz.getDeclaredMethod("mlDsaNttMultiply", > int[].class, int[].class, int[].class); > m.setAccessible(true); > MethodHandle mult = lookup.unreflect(m); > > m = kClazz.getDeclaredMethod("implDilithiumNttMultJava", > int[].class, int[].class, int[].class); > m.setAccessible(true); > MethodHandle multJava = lookup.unreflect(m); > > Random rnd = new Random(); > long seed = rnd.nextLong(); > rnd.setSeed(seed); > //Note: it might be useful to increase this number during development of new intrinsics > final int repeat = 1000000; > int[] coeffs1 = new int[ML_DSA_N]; > int[] coeffs2 = new int[ML_DSA_N]; > int[] prod1 = new int[ML_DSA_N]; > int[] prod2 = new int[ML_DSA_N]; > try { > for (int i = 0; i < repeat; i++) { > run(prod1, prod2, coeffs1, coeffs2, mult, multJava, rnd, seed, i); > } > System.out.println("Fuzz Success"); > } catch (Throwable e) { > System.out.println("Fuzz Failed: " + e); > } > } > > private static final int ML_DSA_N = 256; > public static void run(int[] prod1, int[] prod2, int[] coeffs1, int[] coeffs2, > MethodH... We will consider it for a follow-up PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23860#issuecomment-2766414076 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021150966 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021151152 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021151361 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021151680 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021152095 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021152962 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021154571 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021156249 From duke at openjdk.org Mon Mar 31 14:28:22 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 31 Mar 2025 14:28:22 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11] In-Reply-To: References: Message-ID: On Sun, 23 Mar 2025 00:21:18 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 119: >> >>> 117: static address dilithiumAvx512PermsAddr() { >>> 118: return (address) dilithiumAvx512Perms; >>> 119: } >> >> Hear me out.. ... >> enums!! >> >> enum nttPermOffset { >> montMulPermsIdx = 0, >> nttL4PermsIdx = 64, >> nttL5PermsIdx = 192, >> nttL6PermsIdx = 320, >> nttL7PermsIdx = 448, >> nttInvL0PermsIdx = 704, >> nttInvL1PermsIdx = 832, >> nttInvL2PermsIdx = 960, >> nttInvL3PermsIdx = 1088, >> nttInvL4PermsIdx = 1216, >> }; >> static address dilithiumAvx512PermsAddr(nttPermOffset offset) { >> return (address) dilithiumAvx512Perms + offset; >> } > > belay that comment.. now that I looked at `generate_dilithiumAlmostInverseNtt_avx512`, I see why thats not the 'entire picture'.. I leave it as it is now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021149925 From duke at openjdk.org Mon Mar 31 14:28:24 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 31 Mar 2025 14:28:24 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v10] In-Reply-To: <2yP2P1VNWgQu6cWvn0_a_7LdidS71C6PWKcqGKTOHnc=.49f8ac0f-df23-4f1e-adb9-e03a3f2295b2@github.com> References: <2N5Evij0f6qZi_pG3tqoz11aQbSnLG0YszqHR9ROfKI=.d44b16c6-d334-42c4-8de8-92eb41229248@github.com> <2yP2P1VNWgQu6cWvn0_a_7LdidS71C6PWKcqGKTOHnc=.49f8ac0f-df23-4f1e-adb9-e03a3f2295b2@github.com> Message-ID: On Sat, 22 Mar 2025 16:36:08 GMT, Volodymyr Paprotski wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix windows build > > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 121: > >> 119: static void montmulEven(int outputReg, int inputReg1, int inputReg2, >> 120: int scratchReg1, int scratchReg2, >> 121: int parCnt, MacroAssembler *_masm) { > > nitpick.. this could be made to look more like `montMul64()` by also taking in an array of registers. I eliminated this function. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 160: > >> 158: for (int i = 0; i < 4; i++) { >> 159: __ vpmuldq(xmm(scratchRegs[i]), xmm(inputRegs1[i]), xmm(inputRegs2[i]), >> 160: Assembler::AVX_512bit); > > using an array of registers, instead of array of ints would read somewhat more compact and fewer 'indirections' . i.e. > > static void montMul64(XMMRegister outputRegs*, XMMRegister inputRegs1*, XMMRegister inputRegs2*, > ... > __ vpmuldq(scratchRegs[i], inputRegs1[i], inputRegs2[i], Assembler::AVX_512bit); I think from the names it is easy enough to see that we are really passing register names here and it is also easy to check that the indexes of the registers in the named arrays are really what the names of those arrays suggest, so I would like to leave this alone. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 645: > >> 643: // poly1 (int[256]) = c_rarg1 >> 644: // poly2 (int[256]) = c_rarg2 >> 645: static address generate_dilithiumNttMult_avx512(StubGenerator *stubgen, > > This would be 'nice to have', something 'lost' with the refactor.. > > As I was reviewing this (original) function, I was thinking, "there is nothing here _that_ specific to AVX512, mostly columnar&independent operations... This function could be made 'vector-length-independent'..." > - double the loop length: > > int iter = vector_len==Assembler::AVX_512bit?4:8; > __ movl(len, 4); -> __ movl(len, iter); > > - halve the register arrays.. (or keep them the same but shuffle them to make SURE the first half are in xmm0-xmm15 range) > > XMMRegister POLY1[] = {xmm0, xmm1, xmm12, xmm13}; > XMMRegister POLY2[] = {xmm4, xmm5, xmm16, xmm17}; > XMMRegister SCRATCH1[] = {xmm2, xmm3, xmm14, xmm15}; <<< here > XMMRegister SCRATCH2[] = {xmm6, xmm7, xmm18, xmm19}; <<< and here > XMMRegister SCRATCH3[] = {xmm8, xmm9, xmm10, xmm11}; > > - couple of other int constants (like the memory 'step' and such) > - for assembler calls, like `evmovdqul` and `evpsubd`, need a few small new MacroAssembler helpers to instead generate VEX encoded versions (plenty of instructions already do that). > - I think only the perm instruction was unique to evex (didnt really think of an alternative for AVX2.. but can be abstracted away with another helper) > > Anyway; not suggesting its something you do here.. but it would be convenient to leave breadcrumbs/hooks for a future update so one of us can revisit this code and add AVX2 support. e.g. `parCnt` variable was very convenient before for exactly this, now its gone... it probably could be derived in each function from vector_len but..; Its now cleaner, but also harder to 'upgrade'? > > Why AVX2? many of the newer (Atom/Ecore-based/EnableX86ECoreOpts) processors do not have AVX512 support, so its something I've been prioritizing recently > > The alternative would be to write a completely separate AVX2 implementation, but that would be a shame, not to 'just' reuse this code. > ? > "For fun", I had even gone and parametrized the mult function with the `vector_len` to see how it would look (almost identical... to the original version): > > static void montmulEven2(XMMRegister* outputReg, XMMRegister* inputReg1, XMMRegister* inputReg2, XMMRegister* scratchReg1, > XMMRegister* scratchReg2, XMMRegister montQInvModR, XMMRegister dilithium_q, int parCnt, int vector_len, ... I'd like to leave this for another PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021150150 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021150516 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021153931 From duke at openjdk.org Mon Mar 31 14:28:21 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 31 Mar 2025 14:28:21 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v7] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 19:22:41 GMT, Volodymyr Paprotski wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Made the intrinsics test separate from the pure java test. > > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 45: > >> 43: // Constants >> 44: // >> 45: ATTRIBUTE_ALIGNED(64) static const uint32_t dilithiumAvx512Consts[] = { > > This is really nitpicking.. but could had loaded constants inline with `movl` without requiring an ExternalAddress()? > > Nice to have constants together, only complaint is we have 'magic offsets' in ASM to reach in for particular one.. > > This one isnt too bad, offset of 32bits is easy to inspect visually (`dilithiumAvx512ConstsAddr()` could take a parameter perhaps) I added symbolic names for the indexes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021149647 From duke at openjdk.org Mon Mar 31 14:28:25 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 31 Mar 2025 14:28:25 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v10] In-Reply-To: <36fyT0z29o9GYLeQhpYkIT4d2By-8z7TEU8TGtT2uHI=.50647fa4-32ca-41ef-8287-075a70254143@github.com> References: <2N5Evij0f6qZi_pG3tqoz11aQbSnLG0YszqHR9ROfKI=.d44b16c6-d334-42c4-8de8-92eb41229248@github.com> <2yP2P1VNWgQu6cWvn0_a_7LdidS71C6PWKcqGKTOHnc=.49f8ac0f-df23-4f1e-adb9-e03a3f2295b2@github.com> <36fyT0z29o9GYLeQhpYkIT4d2By-8z7TEU8TGtT2uHI=.50647fa4-32ca-41ef-8287-075a70254143@github.com> Message-ID: On Sun, 23 Mar 2025 00:26:20 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 216: >> >>> 214: // Zmm8-Zmm23 used as scratch registers >>> 215: // result goes to Zmm0-Zmm7 >>> 216: static void montMulByConst128(MacroAssembler *_masm) { >> >> wish the inputs and output register arrays were explicit.. easier to follow that way > > Looking at this function some more.. I think you could remove this function and replace it with two calls to `montMul64`? > > montMul64(xmm0_3, xmm0_3, xmm29_29, Scratch*, _masm); > montMul64(xmm4_7, xmm4_7, xmm29_29, Scratch*, _masm); > ``` > Scratch would have to be defined.. I accepted this suggestion, it really saved quite a few lines of code, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021150687 From duke at openjdk.org Mon Mar 31 14:28:26 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 31 Mar 2025 14:28:26 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v7] In-Reply-To: References: Message-ID: On Sat, 22 Mar 2025 16:11:02 GMT, Volodymyr Paprotski wrote: >> These functions will not be used anywhere else and in ML_DSA.java all of the arrays passed to inrinsics are of the correct size. > > Works for me; just thought I would point it out, so its a 'premeditated' decision. Well, I ended up putting some asserts in the java code, just in case... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021153417 From duke at openjdk.org Mon Mar 31 14:28:27 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 31 Mar 2025 14:28:27 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v5] In-Reply-To: References: <3bphXKLpIpxAZP-FEOeob6AaHbv0BAoEceJka64vMW8=.3e4f74e0-9479-4926-b365-b08d8d702692@github.com> Message-ID: On Thu, 6 Mar 2025 19:26:14 GMT, Volodymyr Paprotski wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Accepted review comments. > > src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 409: > >> 407: __ evmovdquq(xmm29, Address(permsAndRots, 768), Assembler::AVX_512bit); >> 408: __ evmovdquq(xmm30, Address(permsAndRots, 832), Assembler::AVX_512bit); >> 409: __ evmovdquq(xmm31, Address(permsAndRots, 896), Assembler::AVX_512bit); > > Matter of taste, but I liked the compactness of montmulEven; i.e. > > for (i=0; i<15; i++) > __ evmovdquq(xmm(17+i), Address(permsAndRots, 64*i), Assembler::AVX_512bit); Changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021155416 From mdoerr at openjdk.org Mon Mar 31 14:29:37 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 31 Mar 2025 14:29:37 GMT Subject: RFR: 8353274: [PPC64] Bug related to -XX:+UseCompactObjectHeaders -XX:-UseSIGTRAP in JDK-8305895 Message-ID: <6GVJHW26mRaRkydXx2pCmSsjdaKsm4ttjKeQ2vnvHG4=.bbf7fff9-fa27-41ef-a4ac-42daba4b890f@github.com> `MacroAssembler::ic_check` compares the `Klass*` in the compact format (no decode). However, a right shift is needed in case of `UseCompactObjectHeaders` (see `load_narrow_klass_compact`). This was missing in the slower version which doesn't use SIGTRAP. ------------- Commit messages: - 8353274: [PPC64] Bug related to -XX:+UseCompactObjectHeaders -XX:-UseSIGTRAP in JDK-8305895 Changes: https://git.openjdk.org/jdk/pull/24331/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24331&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353274 Stats: 23 lines in 2 files changed: 8 ins; 12 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24331/head:pull/24331 PR: https://git.openjdk.org/jdk/pull/24331 From duke at openjdk.org Mon Mar 31 14:40:56 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 31 Mar 2025 14:40:56 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v12] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Reacting to comments by Volodymyr. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23860/files - new: https://git.openjdk.org/jdk/pull/23860/files/56656894..7a9f6645 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=10-11 Stats: 145 lines in 2 files changed: 24 ins; 91 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From pminborg at openjdk.org Mon Mar 31 14:47:33 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 14:47:33 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v21] In-Reply-To: References: <5BpPdjiai-EEyaut13cxuVnuow-AQUMnB7mBHd7on5Q=.70a81283-6779-45b4-9fc9-e6b911f42b2a@github.com> Message-ID: On Mon, 31 Mar 2025 14:22:16 GMT, Maurizio Cimadamore wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Finish and clean up benchmarks > > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 161: > >> 159: >> 160: static String renderWrapped(Object t) { >> 161: return (t == null) ? UNSET_LABEL : Objects.toString(unwrap(t)); > > If I read correctly, this implementation is similar to what described here: > > https://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/ > > (see section `Using C++11 Acquire and Release Fences`). > > We don't need the "relaxed" loads in Java because a reference load in Java can never tear. Correct? As you rightfully say, references never tear in Java. The Acquire/Release fences are there to protect us from observing partially initialized objects. Without these barriers, there is no guarantee that all the stores performed by a first thread ? in particular, those performed in a constructor ? are visible to load operations in a second thread, even if the reference itself is visible (due to potential reordering) The monitor held by the first thread is not enough, since the second thread doesn?t acquire the monitor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2021194320 From cnorrbin at openjdk.org Mon Mar 31 14:47:31 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 31 Mar 2025 14:47:31 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v16] In-Reply-To: References: Message-ID: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: axel feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23416/files - new: https://git.openjdk.org/jdk/pull/23416/files/ac277b42..7bd2b66b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=14-15 Stats: 89 lines in 3 files changed: 65 ins; 1 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From cnorrbin at openjdk.org Mon Mar 31 14:47:33 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 31 Mar 2025 14:47:33 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v16] In-Reply-To: References: Message-ID: <7OG9tZf0pEd2aTanLy4hGuXrcaJ-lFmg6hDFTt0VQug=.3f2e893a-daab-47bc-bcba-96630353ca0a@github.com> On Fri, 28 Mar 2025 10:11:31 GMT, Axel Boldt-Christmas wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> axel feedback > > src/hotspot/share/utilities/rbTree.inline.hpp line 502: > >> 500: template >> 501: inline void RBTree::visit_range_in_order(const K& from, const K& to, F f) const { >> 502: assert(COMPARATOR::cmp(from, to) <= 0, "from must be less or equal to to"); > > Seem unfortunate to loose these assert, would be nice to find these sort of errors early. > > Maybe we can have some verification functions on the tree which takes (const K& from, const K& to, const NodeType* end_node) which can dispatch to the correct COMPARATOR function. I added asserts to check that `from <= start` and `start <= to`. As long as we find a node we can indirectly test this, since `from <= start <= to` => `from <= to`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r2021193441 From cnorrbin at openjdk.org Mon Mar 31 14:47:35 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 31 Mar 2025 14:47:35 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v15] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 10:07:51 GMT, Axel Boldt-Christmas wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow non-debug verify_self + comparator readability > > src/hotspot/share/utilities/rbTree.inline.hpp line 600: > >> 598: template >> 599: template >> 600: inline void AbstractRBTree::visit_range_in_order(const K& from, const K& to, F f) const { > > Preexisting. > > This is an exclusive end. I would think inclusive end would be more natural. Otherwise you cannot iterate all the way to the end. (Currently can be worked around if the largest possible K is not in the tree, by using it as `to`). Changed to be inclusive instead, thank you for reviewing :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r2021190034 From cnorrbin at openjdk.org Mon Mar 31 14:51:14 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 31 Mar 2025 14:51:14 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v15] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 10:50:54 GMT, Axel Boldt-Christmas wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow non-debug verify_self + comparator readability > > src/hotspot/share/utilities/rbTree.inline.hpp line 548: > >> 546: return; >> 547: } >> 548: > > This could be a future enhancement. But it would be nice that if the COMPARATOR (or the NodeType) supplied a `cmp(const NodeType* a, const NodeType* b)` we could use it to check the order invariants for the children and parent. The `cmp(const NodeType* a, const NodeType* b)` function is identical to the one you needed to give as a template to `verify_self` if you're using intrusive trees. I made that function be a part of the `COMPARATOR` template instead. Now, you supply a second `cmp` in `COMPARATOR` used for verification only, and that is used in both `verify_self` and `replace_at_cursor`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r2021201495 From pminborg at openjdk.org Mon Mar 31 14:55:41 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 14:55:41 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v8] In-Reply-To: References: Message-ID: <50-VD_zJPa_-gDwcKnGE5SOP144RX-CQL57vSc2En4U=.65395327-2e8e-4d2c-a676-dc063fb86475@github.com> On Mon, 31 Mar 2025 14:03:49 GMT, Maurizio Cimadamore wrote: > > I have rewritten all the `toString()` methods. A `StableList::toString` now produces something much more similar to a regular `List::toString`. The only difference is that the `StableList::toString` shows ".unset" for the elements that are not yet evaluated. In other words, `StableList::toString` no longer evaluates all the elements, but rather does a "high impedance" scan over them and if evaluated, invokes `toString` on the element, otherwise just shows ".unset" for that element. > > The same goes for `StableMap` and all the stable functions (which now share the same code path as the stable collections). > > `StableValue` itself does not add extra square brackets around its content. > > This seems fine -- I noticed that `toString` doesn't appear anywhere in the `Collection` API -- so there doesn't seem to be any contract for `List::toString` to behave in the way a list created with `List::of` does. I suppose I'm still not super convinced as to whether the fact that the list is backed by stable holders should be reflected in the `toString` or not -- after all, `toString` can be thought of as a method that depends on the element values, which would trigger materialization for said values. In other words, I'm not sure I get the precise use case where having such a method would be useful. > > E.g. if I'm debugging, do I care whether I'm triggering evaluation earlier than normal? Is the worry that, by performing such an early evaluation, the resulting value for the element might be different from the one that would be triggered during normal execution (e.g. because of some stateful behavior in the function that computes the elements of the stable function) ? As you mentioned, from a formal perspective, we are free to choose if `toString()` should evaluate the elements or not. The feedback I've got is that folks might use a debugger and then by merely observing a stable list (for example by automatic rendering in the source code made by IntelliJ and similar IDEs) the list would be fully evaluated. This makes the list behave differently if run normally compared to being run in a debug session. So, I think the current "high impedance" strategy is motivated. The downside is a bit more complex code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2766493534 From mcimadamore at openjdk.org Mon Mar 31 15:06:30 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 31 Mar 2025 15:06:30 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v8] In-Reply-To: <50-VD_zJPa_-gDwcKnGE5SOP144RX-CQL57vSc2En4U=.65395327-2e8e-4d2c-a676-dc063fb86475@github.com> References: <50-VD_zJPa_-gDwcKnGE5SOP144RX-CQL57vSc2En4U=.65395327-2e8e-4d2c-a676-dc063fb86475@github.com> Message-ID: On Mon, 31 Mar 2025 14:52:28 GMT, Per Minborg wrote: > This makes the list behave differently if run normally compared to being run in a debug session. So, I think the current "high impedance" strategy is motivated. The downside is a bit more complex code. If this is the concern, then having sublists and such work this way was well is important too? E.g. I don't think it's easy to draw a line: any other list or collection generated by stable lists and maps should also behave in this way... ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2766527095 From mcimadamore at openjdk.org Mon Mar 31 15:18:26 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 31 Mar 2025 15:18:26 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v21] In-Reply-To: References: <5BpPdjiai-EEyaut13cxuVnuow-AQUMnB7mBHd7on5Q=.70a81283-6779-45b4-9fc9-e6b911f42b2a@github.com> Message-ID: On Mon, 31 Mar 2025 13:47:27 GMT, Per Minborg wrote: > Here are the latest benchmarks run on an M1 (macOS): > > ``` > Benchmark Mode Cnt Score Error Units > StableFunctionBenchmark.function avgt 10 4.228 ? 0.172 ns/op > StableFunctionBenchmark.map avgt 10 4.323 ? 0.289 ns/op > StableFunctionBenchmark.staticIntFunction avgt 10 1.724 ? 0.121 ns/op > StableFunctionBenchmark.staticSMap avgt 10 1.710 ? 0.045 ns/op > StableFunctionSingleBenchmark.function avgt 10 4.329 ? 0.184 ns/op > StableFunctionSingleBenchmark.map avgt 10 4.291 ? 0.142 ns/op > StableFunctionSingleBenchmark.staticIntFunction avgt 10 0.704 ? 0.022 ns/op > StableFunctionSingleBenchmark.staticSMap avgt 10 0.708 ? 0.027 ns/op > StableIntFunctionBenchmark.intFunction avgt 10 1.558 ? 0.063 ns/op > StableIntFunctionBenchmark.list avgt 10 1.579 ? 0.141 ns/op > StableIntFunctionBenchmark.staticIntFunction avgt 10 1.044 ? 0.031 ns/op > StableIntFunctionBenchmark.staticList avgt 10 2.280 ? 2.013 ns/op > StableIntFunctionSingleBenchmark.intFunction avgt 10 2.333 ? 0.033 ns/op > StableIntFunctionSingleBenchmark.list avgt 10 2.335 ? 0.046 ns/op > StableIntFunctionSingleBenchmark.staticIntFunction avgt 10 0.670 ? 0.022 ns/op > StableIntFunctionSingleBenchmark.staticList avgt 10 0.679 ? 0.021 ns/op > StableSupplierBenchmark.stable avgt 10 1.377 ? 0.042 ns/op > StableSupplierBenchmark.staticStable avgt 10 0.362 ? 0.077 ns/op > StableSupplierBenchmark.staticSupplier avgt 10 0.338 ? 0.016 ns/op > StableSupplierBenchmark.supplier avgt 10 1.609 ? 0.042 ns/op > StableValueBenchmark.atomic avgt 10 1.357 ? 0.046 ns/op > StableValueBenchmark.dcl avgt 10 1.369 ? 0.058 ns/op > StableValueBenchmark.refSupplier avgt 10 0.442 ? 0.007 ns/op > StableValueBenchmark.stable avgt 10 1.522 ? 0.267 ns/op > StableValueBenchmark.stableNull avgt 10 1.237 ? 0.117 ns/op > StableValueBenchmark.staticAtomic avgt 10 1.220 ? 0.058 ns/op > StableValueBenchmark.staticDcl avgt 10 0.357 ? 0.022 ns/op > StableValueBenchmark.staticHolder avgt 10 1.452 ? 0.205 ns/op > StableValueBenchmark.staticRecordHolder avgt 10 0.367 ? 0.028 ns/op > StableValueBenchmark.staticStable avgt 10 0.365 ? 0.026 ns/op > Finished running test 'micro:java.lang.stable' > ``` This seems an outlier: StableIntFunctionBenchmark.staticList avgt 10 2.280 ? 2.013 ns/op (I also note the high error) I believe it could be useful to have one more benchmark showing a `StableValue` holding a `MethodHandle` and do a `get()` + `invokeExact`. I believe that should report more dramatic distinctions when compared to atomic/dcl? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2766560206 From pminborg at openjdk.org Mon Mar 31 15:28:30 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 15:28:30 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v8] In-Reply-To: References: <50-VD_zJPa_-gDwcKnGE5SOP144RX-CQL57vSc2En4U=.65395327-2e8e-4d2c-a676-dc063fb86475@github.com> Message-ID: On Mon, 31 Mar 2025 15:03:42 GMT, Maurizio Cimadamore wrote: > > This makes the list behave differently if run normally compared to being run in a debug session. So, I think the current "high impedance" strategy is motivated. The downside is a bit more complex code. > > If this is the concern, then having sublists and such work this way was well is important too? E.g. I don't think it's easy to draw a line: any other list or collection generated by stable lists and maps should also behave in this way... I agree that eventually, these constructs should also have a "soft" `toString()` method. But, I think they represent a < 10% usage compared to a normal stable list. My idea was to add that later but if we believe this is important from day zero, I could do it now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2766591455 From pminborg at openjdk.org Mon Mar 31 15:39:30 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 15:39:30 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v21] In-Reply-To: References: <5BpPdjiai-EEyaut13cxuVnuow-AQUMnB7mBHd7on5Q=.70a81283-6779-45b4-9fc9-e6b911f42b2a@github.com> Message-ID: On Mon, 31 Mar 2025 15:15:31 GMT, Maurizio Cimadamore wrote: > > Here are the latest benchmarks run on an M1 (macOS): > > ``` > > Benchmark Mode Cnt Score Error Units > > StableFunctionBenchmark.function avgt 10 4.228 ? 0.172 ns/op > > StableFunctionBenchmark.map avgt 10 4.323 ? 0.289 ns/op > > StableFunctionBenchmark.staticIntFunction avgt 10 1.724 ? 0.121 ns/op > > StableFunctionBenchmark.staticSMap avgt 10 1.710 ? 0.045 ns/op > > StableFunctionSingleBenchmark.function avgt 10 4.329 ? 0.184 ns/op > > StableFunctionSingleBenchmark.map avgt 10 4.291 ? 0.142 ns/op > > StableFunctionSingleBenchmark.staticIntFunction avgt 10 0.704 ? 0.022 ns/op > > StableFunctionSingleBenchmark.staticSMap avgt 10 0.708 ? 0.027 ns/op > > StableIntFunctionBenchmark.intFunction avgt 10 1.558 ? 0.063 ns/op > > StableIntFunctionBenchmark.list avgt 10 1.579 ? 0.141 ns/op > > StableIntFunctionBenchmark.staticIntFunction avgt 10 1.044 ? 0.031 ns/op > > StableIntFunctionBenchmark.staticList avgt 10 2.280 ? 2.013 ns/op > > StableIntFunctionSingleBenchmark.intFunction avgt 10 2.333 ? 0.033 ns/op > > StableIntFunctionSingleBenchmark.list avgt 10 2.335 ? 0.046 ns/op > > StableIntFunctionSingleBenchmark.staticIntFunction avgt 10 0.670 ? 0.022 ns/op > > StableIntFunctionSingleBenchmark.staticList avgt 10 0.679 ? 0.021 ns/op > > StableSupplierBenchmark.stable avgt 10 1.377 ? 0.042 ns/op > > StableSupplierBenchmark.staticStable avgt 10 0.362 ? 0.077 ns/op > > StableSupplierBenchmark.staticSupplier avgt 10 0.338 ? 0.016 ns/op > > StableSupplierBenchmark.supplier avgt 10 1.609 ? 0.042 ns/op > > StableValueBenchmark.atomic avgt 10 1.357 ? 0.046 ns/op > > StableValueBenchmark.dcl avgt 10 1.369 ? 0.058 ns/op > > StableValueBenchmark.refSupplier avgt 10 0.442 ? 0.007 ns/op > > StableValueBenchmark.stable avgt 10 1.522 ? 0.267 ns/op > > StableValueBenchmark.stableNull avgt 10 1.237 ? 0.117 ns/op > > StableValueBenchmark.staticAtomic avgt 10 1.220 ? 0.058 ns/op > > StableValueBenchmark.staticDcl avgt 10 0.357 ? 0.022 ns/op > > StableValueBenchmark.staticHolder avgt 10 1.452 ? 0.205 ns/op > > StableValueBenchmark.staticRecordHolder avgt 10 0.367 ? 0.028 ns/op > > StableValueBenchmark.staticStable avgt 10 0.365 ? 0.026 ns/op > > Finished running test 'micro:java.lang.stable' > > ``` > > This seems an outlier: > > ``` > StableIntFunctionBenchmark.staticList avgt 10 2.280 ? 2.013 ns/op > ``` > > (I also note the high error) > > I believe it could be useful to have one more benchmark showing a `StableValue` holding a `MethodHandle` and do a `get()` + `invokeExact`. I believe that should report more dramatic distinctions when compared to atomic/dcl? Thanks for "hawk-eying" this discrepancy. There seemed to be some flux when I ran the benchmarks (I've used a laptop). Running the benchmarks in a more controlled environment revealed there was no difference. Also, rerunning the particular benchmark now shows: StableIntFunctionBenchmark.intFunction avgt 10 2.317 ? 1.252 ns/op StableIntFunctionBenchmark.list avgt 10 2.303 ? 1.302 ns/op StableIntFunctionBenchmark.staticIntFunction avgt 10 1.044 ? 0.036 ns/op StableIntFunctionBenchmark.staticList avgt 10 1.052 ? 0.061 ns/op I will add a `MethodHandle` benchmark. Good suggestion! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2766618937 From qamai at openjdk.org Mon Mar 31 15:44:38 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 31 Mar 2025 15:44:38 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v21] In-Reply-To: References: <5BpPdjiai-EEyaut13cxuVnuow-AQUMnB7mBHd7on5Q=.70a81283-6779-45b4-9fc9-e6b911f42b2a@github.com> Message-ID: On Mon, 31 Mar 2025 14:22:16 GMT, Maurizio Cimadamore wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Finish and clean up benchmarks > > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 161: > >> 159: >> 160: static String renderWrapped(Object t) { >> 161: return (t == null) ? UNSET_LABEL : Objects.toString(unwrap(t)); > > If I read correctly, this implementation is similar to what described here: > > https://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/ > > (see section `Using C++11 Acquire and Release Fences`). > > We don't need the "relaxed" loads in Java because a reference load in Java can never tear. Correct? @mcimadamore That atomic load under the lock is unnecessary because the load will never be concurrent with any store. I believe `relaxed` has to be used because C++ lacks the ability to perform atomic operations and non-atomic operations on the same object until C++ 20 with `std::atomic_ref`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2021289826 From pminborg at openjdk.org Mon Mar 31 16:07:07 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 16:07:07 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v22] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Add MethodHandle benchmark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/7fb8cb41..df4ef35c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=20-21 Stats: 96 lines in 1 file changed: 96 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From pminborg at openjdk.org Mon Mar 31 16:07:08 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 31 Mar 2025 16:07:08 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v21] In-Reply-To: <5BpPdjiai-EEyaut13cxuVnuow-AQUMnB7mBHd7on5Q=.70a81283-6779-45b4-9fc9-e6b911f42b2a@github.com> References: <5BpPdjiai-EEyaut13cxuVnuow-AQUMnB7mBHd7on5Q=.70a81283-6779-45b4-9fc9-e6b911f42b2a@github.com> Message-ID: <14unTVqrBXjhB_DYk7CpK5eXmCy_CU7LgaJ0MvLkuis=.515d378c-288f-4cbf-b92e-83d2cf81eb20@github.com> On Mon, 31 Mar 2025 13:32:56 GMT, Per Minborg wrote: >> Implement JEP 502. >> >> The PR passes tier1-tier3 tests. > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Finish and clean up benchmarks Here is new result of the new `MethodHandle` benchmark: Benchmark Mode Cnt Score Error Units StableMethodHandleBenchmark.finalMh avgt 10 0.809 ? 0.120 ns/op StableMethodHandleBenchmark.mh avgt 10 3.509 ? 0.199 ns/op StableMethodHandleBenchmark.stableMh avgt 10 0.762 ? 0.050 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/23972#issuecomment-2766694309 From stuefe at openjdk.org Mon Mar 31 16:30:37 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 31 Mar 2025 16:30:37 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances Message-ID: In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. For details, please see JBS issue text. ----------------------- Patch results: The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: Before: 5395 - non-static oop maps (0 entries) 9330 - non-static oop maps (1 entries) 1449 - non-static oop maps (2 entries) 274 - non-static oop maps (3 entries) 218 - non-static oop maps (4 entries) 75 - non-static oop maps (5 entries) 7 - non-static oop maps (6 entries) 4 - non-static oop maps (7 entries) Now: 5395 - non-static oop maps (0 entries) 10178 - non-static oop maps (1 entries) 933 - non-static oop maps (2 entries) 229 - non-static oop maps (3 entries) 16 - non-static oop maps (4 entries) 1 - non-static oop maps (5 entries) For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: Before: java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} - ---- non-static fields (9 words): - final 'hash' 'I' @12 - final 'key' 'Ljava/lang/Object;' @16 - volatile 'val' 'Ljava/lang/Object;' @20 - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class - 'red' 'Z' @28 << derived class starts here, non-oops lead - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 - non-static oop maps (2 entries): 16-24 32-44 Now: java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} - ---- non-static fields (9 words): - final 'hash' 'I' @12 - final 'key' 'Ljava/lang/Object;' @16 - volatile 'val' 'Ljava/lang/Object;' @20 - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oops lead - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 - 'red' 'Z' @44 - non-static oop maps (1 entries): 16-40 Note how the sole primitive field of the derived class, "red", changed position to let oops lead. Here a contrived example to demonstrate how reordering works across several inheritance levels: Before: GeneratedClass9730 {0x000000008f93ea50} - ---- non-static fields (11 words): - public 'derived0_I_0' 'I' @12 - public 'derived0_o_0' 'Ljava/lang/Object;' @16 << last field of base class - public 'derived1_I_0' 'I' @20 - public 'derived1_o_0' 'Ljava/lang/Object;' @24 << last field of derived class 1 - public 'derived2_I_0' 'I' @28 - public 'derived2_o_0' 'Ljava/lang/Object;' @32 << last field of derived class 2 - public 'derived3_I_0' 'I' @36 - public 'derived3_o_0' 'Ljava/lang/Object;' @40 << last field of derived class 3 - public 'derived4_I_0' 'I' @44 - public 'derived4_o_0' 'Ljava/lang/Object;' @48 << last field of derived class 4 - public 'o0' 'Ljava/lang/Object;' @52 << this class starts here - non-static oop maps (5 entries): 16-16 24-24 32-32 40-40 48-52 After GeneratedClass9730 {0x00000000a793e5f8} - ---- non-static fields (11 words): - public 'derived0_I_0' 'I' @12 - public 'derived0_o_0' 'Ljava/lang/Object;' @16 << last field of base class - public 'derived1_o_0' 'Ljava/lang/Object;' @20 - public 'derived1_I_0' 'I' @24 << last field of derived class 1 - public 'derived2_I_0' 'I' @28 - public 'derived2_o_0' 'Ljava/lang/Object;' @32 << last field of derived class 2 - public 'derived3_o_0' 'Ljava/lang/Object;' @36 - public 'derived3_I_0' 'I' @40 << last field of derived class 3 - public 'derived4_I_0' 'I' @44 - public 'derived4_o_0' 'Ljava/lang/Object;' @48 << last field of derived class 4 - public 'o0' 'Ljava/lang/Object;' @52 << this class starts here - non-static oop maps (3 entries): 16-20 32-36 48-52 ------------- Commit messages: - alternate-order - print Changes: https://git.openjdk.org/jdk/pull/24330/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24330&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353273 Stats: 48 lines in 2 files changed: 43 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24330.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24330/head:pull/24330 PR: https://git.openjdk.org/jdk/pull/24330 From jbhateja at openjdk.org Mon Mar 31 16:43:39 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 31 Mar 2025 16:43:39 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11] In-Reply-To: References: Message-ID: <-sFpKarpt9CP7DYd7v9vSBAgHYthQ4OZFNGHFOgb2AI=.fc908719-8e45-43d2-97df-95ff01129275@github.com> On Mon, 31 Mar 2025 11:11:54 GMT, Ferenc Rakoczi wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 1252: >> >>> 1250: // Currently we only have them for AVX512 >>> 1251: #ifdef _LP64 >>> 1252: if (supports_evex() && supports_avx512bw()) { >> >> supports_evex check looks redundant. > > These are checks for two different feature bits: CPU_AVX512F and CPU_AVX512BW. Are you saying that the latter implies the former in every implementation of the spec? AVX512BW is built on top of AVX512F spec. In assembler and other places we only check BW in assertions which implies EVEX. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2021381288 From gziemski at openjdk.org Mon Mar 31 18:45:16 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 31 Mar 2025 18:45:16 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 08:22:09 GMT, Stefan Karlsson wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> work > > src/hotspot/share/runtime/os.cpp line 2130: > >> 2128: log_trace(os, map)(ERRFMT, ERRFMTARGS); >> 2129: log_debug(os, map)("successfully attached at " PTR_FORMAT, p2i(result)); >> 2130: MemTracker::record_virtual_memory_reserve((address)result, bytes, CALLER_PC, mtNone); > > I think attempt_reserve_memory_between should provide the correct mem tag. I wasn't sure what that is here, we can do this in a follow up? > src/hotspot/share/runtime/os.cpp line 2336: > >> 2334: if (result != nullptr) { >> 2335: // The memory is committed >> 2336: MemTracker::record_virtual_memory_reserve_and_commit((address)result, size, CALLER_PC, mtNone); > > reserve_memory_special should take a mem tag, but I guess you intend to do that as a follow-up RFE? Yes, I'm trying to keep changes small and not overdo it. There will be follow-ups. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2021591591 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2021590157 From pchilanomate at openjdk.org Mon Mar 31 18:48:45 2025 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 31 Mar 2025 18:48:45 GMT Subject: RFR: 8353117: Crash: assert(id >= ThreadIdentifier::initial() && id < ThreadIdentifier::current()) failed: must be reasonable) Message-ID: <6ludi2j0fKqL5_MirvViyefGITPvMKzAIX8EJIhfbFE=.3935e8bc-e9a3-4ed1-8da6-e943c97714b6@github.com> Please review the following fix. For the attaching thread case we are incorrectly setting the `_monitor_owner_id` after `Threads::add()` is called, i.e after the attaching thread becomes visible through a ThreadsListHandle. So if another thread calls `Threads::owning_thread_from_monitor()` in between these events and iterates through all JavaThreads looking for the owner of a given monitor, we might find this attaching thread still with a `_monitor_owner_id` of 0. I corrected the ordering and improved verification checks. Tested in mach5 tiers1-5. Thanks, Patricio ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/24336/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24336&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353117 Stats: 30 lines in 7 files changed: 20 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24336.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24336/head:pull/24336 PR: https://git.openjdk.org/jdk/pull/24336 From gziemski at openjdk.org Mon Mar 31 18:52:33 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 31 Mar 2025 18:52:33 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v4] In-Reply-To: References: Message-ID: > This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. > > I tried to fill in tag, when I was pretty certain that I had the right type. > > At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: use attempt_reserve_memory_at default parameter value for exec, where possible ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24282/files - new: https://git.openjdk.org/jdk/pull/24282/files/5a1a75e9..40cb4384 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=02-03 Stats: 6 lines in 5 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24282.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24282/head:pull/24282 PR: https://git.openjdk.org/jdk/pull/24282 From stefank at openjdk.org Mon Mar 31 19:28:09 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 31 Mar 2025 19:28:09 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 18:42:16 GMT, Gerard Ziemski wrote: >> src/hotspot/share/runtime/os.cpp line 2130: >> >>> 2128: log_trace(os, map)(ERRFMT, ERRFMTARGS); >>> 2129: log_debug(os, map)("successfully attached at " PTR_FORMAT, p2i(result)); >>> 2130: MemTracker::record_virtual_memory_reserve((address)result, bytes, CALLER_PC, mtNone); >> >> I think attempt_reserve_memory_between should provide the correct mem tag. > > I wasn't sure what that is here, we can do this in a follow up? My suggestion was to change `attempt_reserve_memory_between` to take MemTag as an argument. It's OK to do that as a follow-up PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2021646441 From gziemski at openjdk.org Mon Mar 31 21:13:16 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 31 Mar 2025 21:13:16 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 07:59:08 GMT, Stefan Karlsson wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> work > > src/hotspot/share/runtime/safepointMechanism.cpp line 60: > >> 58: const size_t page_size = os::vm_page_size(); >> 59: const size_t allocation_size = 2 * page_size; >> 60: char* polling_page = os::reserve_memory(allocation_size, mtSafepoint, !ExecMem); > > Suggestion: > > char* polling_page = os::reserve_memory(allocation_size, mtSafepoint); I think here we need to keep `!ExecMem` since it is a parameter. > src/hotspot/share/utilities/debug.cpp line 715: > >> 713: #ifdef CAN_SHOW_REGISTERS_ON_ASSERT >> 714: void initialize_assert_poison() { >> 715: char* page = os::reserve_memory(os::vm_page_size(), mtInternal, !ExecMem); > > Suggestion: > > char* page = os::reserve_memory(os::vm_page_size(), mtInternal); Again, `ExecMem` is a parameter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2021769950 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2021770850 From duke at openjdk.org Mon Mar 31 21:29:16 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Mon, 31 Mar 2025 21:29:16 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 10:53:57 GMT, Ashutosh Mehra wrote: >> Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 >> - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent >> - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing >> >> Remove from cgroups v1 branch incorrect log messages about cpuset >> controller being optional. Add test case for cgroups v1, cpuset >> disabled. >> - Improve !cgroups_v2_enabled branch comment >> - Debug-log optional and disabled cgroups v2 controllers >> >> Do not log enabled controllers that are not relevant to the JDK. >> - Move index declaration to scope in which it is used >> - Remove empty string check during cgroup.controllers parsing >> - Define ISSPACE_CHARS macro, use it in strsep call >> - Pass fgets result to strsep >> - Replace is_cgroupsV2 with cgroups_v2_enabled >> >> Also fix the testCgroupv1SystemdOnly and testCgroupv1NoMounts test >> cases such that their /proc/cgroups and /proc/self/cgroup contents >> correspond. This prevents assertion failures these tests were >> producing when is_cgroupsV2 was replaced with cgroups_v2_enabled. >> - ... and 3 more: https://git.openjdk.org/jdk/compare/8b0ec7c1...b6926e15 > > test/hotspot/jtreg/containers/cgroup/CgroupSubsystemFactory.java line 459: > >> 457: public void testCgroupv1SystemdOnly(WhiteBox wb) { >> 458: String procCgroups = cgroupv1CgInfoZeroHierarchy.toString(); >> 459: String procSelfCgroup = cgroupV2SelfCgroup.toString(); > > I don't get why is this change required? The test name `testCgroupv1SystemdOnly` suggests it is testing cgroup v1 only but then it passes cgroup v2 proc file. Same for `testCgroupv1NoMounts`. Thank you for reviewing. This test consistency fix is discussed [here](https://github.com/openjdk/jdk/pull/23811#discussion_r1973877201) and [here](https://github.com/openjdk/jdk/pull/23811#discussion_r1978045429); I agree the result is confusing. Instead I will change `cgroupv1CgInfoZeroHierarchy` to `cgroupv1CgInfoNonZeroHierarchy` which achieves the same effect using only `cgroup v1` fields. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r2021787908 From duke at openjdk.org Mon Mar 31 21:35:25 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Mon, 31 Mar 2025 21:35:25 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 10:53:50 GMT, Ashutosh Mehra wrote: >> Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 >> - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent >> - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing >> >> Remove from cgroups v1 branch incorrect log messages about cpuset >> controller being optional. Add test case for cgroups v1, cpuset >> disabled. >> - Improve !cgroups_v2_enabled branch comment >> - Debug-log optional and disabled cgroups v2 controllers >> >> Do not log enabled controllers that are not relevant to the JDK. >> - Move index declaration to scope in which it is used >> - Remove empty string check during cgroup.controllers parsing >> - Define ISSPACE_CHARS macro, use it in strsep call >> - Pass fgets result to strsep >> - Replace is_cgroupsV2 with cgroups_v2_enabled >> >> Also fix the testCgroupv1SystemdOnly and testCgroupv1NoMounts test >> cases such that their /proc/cgroups and /proc/self/cgroup contents >> correspond. This prevents assertion failures these tests were >> producing when is_cgroupsV2 was replaced with cgroups_v2_enabled. >> - ... and 3 more: https://git.openjdk.org/jdk/compare/eb4ea706...b6926e15 > > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 81: > >> 79: // file system magic. If it does not then heuristics are required to determine >> 80: // if cgroups v1 is usable or not. >> 81: if (statfs(sys_fs_cgroup, &fsstat) != -1) { > > I feel this logic should be moved to `determine_type` as it is responsible for determining the version of the cgroup subsystem. OK, I tend to agree; I will investigate alternatives. I did consider putting the `statfs` logic inside but ended up leaving it outside because `determine_type` is called by the `whitebox` framework, and "mocking" `statfs` is not possible with regular files. The idea is to allow the test suite to simply mock the `statfs` result via the boolean `cgroups_v2_enabled` argument. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r2021793827 From vlivanov at openjdk.org Mon Mar 31 21:51:17 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 31 Mar 2025 21:51:17 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov wrote: > Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. > > It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. > > PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. > > Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. > > Testing: hs-tier1 - hs-tier4, microbenchmarks Thanks for the reviews. > This makes me wonder: @iwanowww what kind of testing have you done to ensure this works correctly? hs-tier1 - hs-tier4 (and up to hs-tier6 as part of larger set of changes). Vector API unit tests (under `test/jdk/jdk/incubator/vector/`) exercise this functionality. >> Is leaving the sources of sleef in share/native the right thing to do? > No, it should move to the least common directory for all platforms where it is needed. In this case, it should move to unix instead of share. Strictly speaking, `src/jdk.incubator.vector/linux/native/libsleef` consists of 3 parts: (a) original SLEEF library sources (under `upstream/` sub-folder); (b) platform-specific generated code (under `generated/`); (c) custom native wrappers used to build `libsleef` library in JDK (under `lib/`). While (c) may be Linux-specific, SLEEF library is cross-platform and covers wide range of platforms [1]. So, strictly speaking, it's (c) which are truly platform-specific and deserve being placed under `[linux|unix]/native`. Moreover, I'm experimenting with SLEEF usage on x86, so it's possible that it will be used on linux-x64/windows-x64 eventually. I'm fine with it either way. But if we don't want to relocate SLEEF sources again rather soon, I suggest to place it under `share/` right away. [1] https://sleef.org/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2767493489 From vlivanov at openjdk.org Mon Mar 31 21:57:07 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 31 Mar 2025 21:57:07 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:27:23 GMT, Magnus Ihse Bursie wrote: >> Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. >> >> It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. >> >> PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. >> >> Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. >> >> Testing: hs-tier1 - hs-tier4, microbenchmarks > > Instead of trying to guide you how to fix this, I made the unification of all libsleef stanzas myself. It is available here: > > https://github.com/openjdk/jdk/commit/04feadda561b2f7a6afff440ab5b4e188361c048 > > That commit assumes that `vector_math_sve.c` should have `$(SVE_CFLAGS)` on mac as well as on linux. If that is not correct, then it needs to be adjusted. Thanks a lot, @magicus! > That commit assumes that vector_math_sve.c should have $(SVE_CFLAGS) on mac as well as on linux. If that is not correct, then it needs to be adjusted. As of now, Apple Silicon doesn't support SVE/SVE2, so I intentionally excluded SVE support on macosx-aarch64. What would be the best way to exclude `vector_math_sve.c` on macosx-aarch64? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2767501215 From ccheung at openjdk.org Mon Mar 31 22:15:39 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 31 Mar 2025 22:15:39 GMT Subject: RFR: 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 [v2] In-Reply-To: References: Message-ID: > Two archive relocation tests failed when `-XX:ArchiveRelocationMode=0` is specified via the jtreg `-javaoption`. > A fix is to add a `WhiteBox.getArchiveRelocationMode()` method so that the tests can check if the `ArchiveRelocationMode` is set to 0 before checking the expected output. > > Passed tiers 1 - 4 testing. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: simplify the fix per David's suggestion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24308/files - new: https://git.openjdk.org/jdk/pull/24308/files/94b6164b..9bfa2ec6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24308&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24308&range=00-01 Stats: 34 lines in 4 files changed: 3 ins; 21 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24308.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24308/head:pull/24308 PR: https://git.openjdk.org/jdk/pull/24308 From ccheung at openjdk.org Mon Mar 31 22:25:13 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 31 Mar 2025 22:25:13 GMT Subject: RFR: 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 In-Reply-To: <7R24pvsFCXdrj84C24wvdfS1BGrQlvS3jys8r9kD744=.491edc21-645c-4cb7-846a-f5e1e93ca7f5@github.com> References: <7R24pvsFCXdrj84C24wvdfS1BGrQlvS3jys8r9kD744=.491edc21-645c-4cb7-846a-f5e1e93ca7f5@github.com> Message-ID: On Mon, 31 Mar 2025 00:57:43 GMT, David Holmes wrote: > I think these tests are very confusing! > > test/hotspot/jtreg/runtime/cds/appcds/ArchiveRelocationTest.java > > States > > > @comment the test uses -XX:ArchiveRelocationMode=1 to force relocation. > > but that is not what it does. It either sets -`XX:ArchiveRelocationMode=0` in the exec'd VM or it relies on the default being 1 - which is not the case if it was set directly via JTREG. So it seems to me the right, and simple, fix here is to always pass the expected `-XX:ArchiveRelocationMode` value to the exec'd VM and ignore/override whatever comes in via the command-line. I've simplified the fix based on your suggestions. The new change contains only test changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24308#issuecomment-2767550637 From ccheung at openjdk.org Mon Mar 31 22:25:15 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 31 Mar 2025 22:25:15 GMT Subject: RFR: 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 [v2] In-Reply-To: References: Message-ID: On Sun, 30 Mar 2025 01:09:28 GMT, Kim Barrett wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> simplify the fix per David's suggestion > > test/hotspot/jtreg/runtime/cds/appcds/dynamicArchive/DynamicArchiveRelocationTest.java line 48: > >> 46: static int relocationMode = -1; >> 47: public static void main(String... args) throws Exception { >> 48: WhiteBox wb = WhiteBox.getWhiteBox(); > > It seems this test already had WhiteBox enabled, but wasn't actually using it before this change. Yes, the base class `DynamicArchiveTestBase` requires `WhiteBox`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24308#discussion_r2021851608 From iklam at openjdk.org Mon Mar 31 22:58:11 2025 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 31 Mar 2025 22:58:11 GMT Subject: RFR: 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 22:15:39 GMT, Calvin Cheung wrote: >> Two archive relocation tests failed when `-XX:ArchiveRelocationMode=0` is specified via the jtreg `-javaoption`. >> A fix is to add a `WhiteBox.getArchiveRelocationMode()` method so that the tests can check if the `ArchiveRelocationMode` is set to 0 before checking the expected output. >> >> Passed tiers 1 - 4 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > simplify the fix per David's suggestion LGTM. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24308#pullrequestreview-2730694923 From iklam at openjdk.org Mon Mar 31 23:14:34 2025 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 31 Mar 2025 23:14:34 GMT Subject: RFR: 8353325: Rewrite appcds/methodHandles test cases to use CDSAppTester Message-ID: These test cases are rewritten to use CDSAppTester, so that they can also be executed in the new JEP 483 workflow (with `-XX:AOTCache=xxx`, etc). This will increase coverage of current and upcoming AOT features (such as AOT linking of invokedynamic, and AOT method profiling). These test cases are generated by a bash script. This PR minimizes the generated part so that the main portions of the tests can be modified as a normal java source file. ------------- Commit messages: - step 3: added support for dynamic and aot workflows - step 2: updated all tests to use STATIC workflow - step 1 Changes: https://git.openjdk.org/jdk/pull/24340/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24340&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353325 Stats: 1611 lines in 18 files changed: 353 ins; 1214 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/24340.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24340/head:pull/24340 PR: https://git.openjdk.org/jdk/pull/24340