From swen at openjdk.org Thu May 1 17:35:56 2025 From: swen at openjdk.org (Shaojin Wen) Date: Thu, 1 May 2025 17:35:56 GMT Subject: RFR: 8356044: Use Double::hashCode and Long::hashCode in java.vm.ci.meta Message-ID: <8SlBOjUBPGyZbR9GxEBZlLzOiNPbdws1GTZ4gGY8v9c=.fdefa26b-52ee-48f9-b814-3981b79f6012@github.com> Similar to #24959 and #24971 and #24987, AbstractProfiledItem/PrimitiveConstant in java.vm.ci.meta can also be simplified similarly. Replace manual bitwise operations in hashCode implementations of java.vm.ci.meta.AbstractProfiledItem/java.vm.ci.meta.PrimitiveConstant with Long::hashCode/Double.hashCode. ------------- Commit messages: - Use Double::hashCode & Long::hashCode Changes: https://git.openjdk.org/jdk/pull/24988/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24988&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356044 Stats: 8 lines in 2 files changed: 0 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24988/head:pull/24988 PR: https://git.openjdk.org/jdk/pull/24988 From jbhateja at openjdk.org Fri May 2 07:50:27 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 2 May 2025 07:50:27 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v7] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Addressing code refactoring comments - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 - Fix windows build - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 - Add dynamic sized feature vectors - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 - dropping unneeded feature enabling/checks - 8352675: Support Intel AVX10 converged vector ISA feature detection ------------- Changes: https://git.openjdk.org/jdk/pull/24329/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=06 Stats: 545 lines in 27 files changed: 315 ins; 14 del; 216 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From mchevalier at openjdk.org Fri May 2 08:07:57 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 2 May 2025 08:07:57 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls Message-ID: A first part toward a better support of pure functions. ## Pure Functions Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. ## Scope We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. ## Implementation Overview We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! IR framework and IGV needed a little bit of fixing. Thanks, Marc ------------- Commit messages: - Clean up IRNode - cleanup - hash and cmp - get_early_ctrl_for_expensive - depends_only_on_test - depends_only_on_test - First try Changes: https://git.openjdk.org/jdk/pull/24966/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24966&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347901 Stats: 694 lines in 15 files changed: 449 ins; 226 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/24966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24966/head:pull/24966 PR: https://git.openjdk.org/jdk/pull/24966 From jbhateja at openjdk.org Fri May 2 08:08:27 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 2 May 2025 08:08:27 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v8] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Updating comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/04de0289..4a614be8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=06-07 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From jbhateja at openjdk.org Fri May 2 11:31:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 2 May 2025 11:31:01 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v9] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Refactoring code to create a seperate VM_Features class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/4a614be8..a9258174 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=07-08 Stats: 63 lines in 3 files changed: 32 ins; 22 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From sviswanathan at openjdk.org Fri May 2 20:54:47 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 2 May 2025 20:54:47 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v9] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 11:31:01 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Refactoring code to create a seperate VM_Features class src/hotspot/cpu/x86/vm_version_x86.cpp line 464: > 462: __ movl(rcx, 0x18000000); // cpuid1 bits osxsave | avx > 463: __ andl(rcx, Address(rsi, 8)); // cpuid1 bits osxsave | avx > 464: __ jccb(Assembler::equal, done); // jump if AVX is not supported This doesn't not have same effect as before. Consider input is 0x10000000, the andl result will not be zero with this code and so jump to done will not happen. Whereas prior to this change, the cmpl with 0x18000000 will fail for equality and so a jump to done will happen. This is the case for all the places where we are checking more than 1 set bit. src/hotspot/cpu/x86/vm_version_x86.cpp line 468: > 466: __ movl(rax, 0x6); > 467: __ andl(rax, Address(rbp, in_bytes(VM_Version::xem_xcr0_offset()))); // xcr0 bits sse | ymm > 468: __ jccb(Assembler::notEqual, start_simd_check); // return if AVX is not supported See prior comment, need the cmpl and jmp here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072134109 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072136639 From vlivanov at openjdk.org Fri May 2 20:59:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 2 May 2025 20:59:46 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v9] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 11:31:01 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Refactoring code to create a seperate VM_Features class Jatin, are you done with the refactorings? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2848107604 From duke at openjdk.org Fri May 2 22:31:00 2025 From: duke at openjdk.org (Mohamed Issa) Date: Fri, 2 May 2025 22:31:00 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v2] In-Reply-To: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. > > For performance data collected with the built in **cbrt** micro-benchmark, see the table below. Each result is the mean of 8 individual runs. Overall, the intrinsic provides a performance uplift of 41%. > > | Benchmark | Throughput with baseline (op/s) | Throughput with intrinsic (op/s) | Speedup | > | :----------------: | :----------------------------------: | :----------------------------------: | :---------: | > | MathBench.cbrt | 148242 | 209122 | 1.41x | > > Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. Mohamed Issa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'openjdk:master' into user/missa-prime/cbrt - Change coeff_table alignment from 4 bytes to 16 bytes to conform with movapd instruction - Merge branch 'master' into user/missa-prime/cbrt - x86_64 intrinsic for cbrt using libm ------------- Changes: https://git.openjdk.org/jdk/pull/24470/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24470&range=01 Stats: 466 lines in 26 files changed: 453 ins; 1 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24470/head:pull/24470 PR: https://git.openjdk.org/jdk/pull/24470 From jbhateja at openjdk.org Sat May 3 07:26:29 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 07:26:29 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/a9258174..051c416c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=08-09 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From jbhateja at openjdk.org Sat May 3 07:32:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 07:32:46 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v9] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 20:57:17 GMT, Vladimir Ivanov wrote: > Jatin, are you done with the refactorings? @iwanowww, I have addressed your comments. Let me know if you have further comments / feedback. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2848484313 From jbhateja at openjdk.org Sat May 3 07:32:47 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 07:32:47 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v9] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 20:47:01 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Refactoring code to create a seperate VM_Features class > > src/hotspot/cpu/x86/vm_version_x86.cpp line 464: > >> 462: __ movl(rcx, 0x18000000); // cpuid1 bits osxsave | avx >> 463: __ andl(rcx, Address(rsi, 8)); // cpuid1 bits osxsave | avx >> 464: __ jccb(Assembler::equal, done); // jump if AVX is not supported > > This doesn't not have same effect as before. Consider input is 0x10000000, the andl result will not be zero with this code and so jump to done will not happen. Whereas prior to this change, the cmpl with 0x18000000 will fail for equality and so a jump to done will happen. This is the case for all the places where we are checking more than 1 set bit. Thanks @sviswa7 , sub-optimality was mainly around single-bit comparisons, where we could save redundant CMP after AND, and by flipping the predicate of subsequent flag-consuming JMP, multibits compares should remain unaltered. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072341101 From vlivanov at openjdk.org Sat May 3 07:44:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 3 May 2025 07:44:48 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 07:26:29 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Ok, thanks! I wasn't sure you finished the pass. I'm still seeing dynamic memory allocation which IMO unnecessarily complicates the implementation. Bitmap size is fixed and well-known at compile time. It enables `VM_Feature` class to embed the array of proper size inline. And it eliminates all the problems related to undesired sharing of backed array. (Also, `pre_initialize()` is not needed as well.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2848488960 From jbhateja at openjdk.org Sat May 3 07:54:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 07:54:46 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 07:41:43 GMT, Vladimir Ivanov wrote: > Ok, thanks! I wasn't sure you finished the pass. > > I'm still seeing dynamic memory allocation which IMO unnecessarily complicates the implementation. Bitmap size is fixed and well-known at compile time. It enables `VM_Feature` class to embed the array of proper size inline. And it eliminates all the problems related to undesired sharing of backed array. (Also, `pre_initialize()` is not needed as well.) pre_initialize was put in place because codeCache_init () proceeds VM_Version_init() and it makes calls to some assembler routines which checks for existinace of certain targets features. Its an ordering issue, pre_initialize simply allocates feature vector upfront to prevent crashing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2848492777 From vlivanov at openjdk.org Sat May 3 07:54:47 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 3 May 2025 07:54:47 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 07:26:29 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution src/hotspot/cpu/x86/vm_version_x86.cpp line 2867: > 2865: > 2866: uint64_t VM_Version::CpuidInfo::feature_flags() const { > 2867: uint64_t result = 0; It's unfortunate you migrated away from operating on a local copy. Why don't you declare a local copy (`VM_Version result`) and migrate bit manipulation to bit field accessors on it? `VM_Version::CpuidInfo::feature_flags()` can still return it by value (once you get rid of heap memory allocation, copying becomes trivial). src/hotspot/share/runtime/abstract_vm_version.hpp line 88: > 86: static VM_Features _dynamic_cpu_features; > 87: > 88: #define SET_CPU_FEATURE(feature) \ Why don't you supersede macros with instance methods on `VM_Version` instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072344671 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072343204 From jbhateja at openjdk.org Sat May 3 07:57:45 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 07:57:45 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 07:52:45 GMT, Jatin Bhateja wrote: > Ok, thanks! I wasn't sure you finished the pass. > > I'm still seeing dynamic memory allocation which IMO unnecessarily complicates the implementation. Bitmap size is fixed and well-known at compile time. It enables `VM_Feature` class to embed the array of proper size inline. And it eliminates all the problems related to undesired sharing of backed array. (Also, `pre_initialize()` is not needed as well.) I made it dynamic since to keep it flexible, but the bitmap size depends on maximum feature enum value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2848493614 From jbhateja at openjdk.org Sat May 3 08:08:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 08:08:46 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 07:52:21 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > src/hotspot/cpu/x86/vm_version_x86.cpp line 2867: > >> 2865: >> 2866: uint64_t VM_Version::CpuidInfo::feature_flags() const { >> 2867: uint64_t result = 0; > > It's unfortunate you migrated away from operating on a local copy. Why don't you declare a local copy (`VM_Version result`) and migrate bit manipulation to bit field accessors on it? `VM_Version::CpuidInfo::feature_flags()` can still return it by value (once you get rid of heap memory allocation, copying becomes trivial). New implimentation directly modify the feature vector bits though macros. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072346669 From vlivanov at openjdk.org Sat May 3 08:17:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 3 May 2025 08:17:49 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: <3t1R35B9bafRtfvqfE7D2dAeLrjaDukXlDUGb-3VtaA=.46d64318-e9fb-4bf3-8a68-8dba2c2b7b26@github.com> On Sat, 3 May 2025 07:55:10 GMT, Jatin Bhateja wrote: > Bitmap size depends on the maximum feature enum value, I made it dynamic to keep it flexible. Do you want the feature vector size to be made constant and manually bump it when we exhaust the limit? Yes, please. (The limit may be precise - number of elements in Feature_Flag enum - but the logic which computes the size of backing array can automatically round it and bump the size once the actual limit is reached.) > pre_initialize was put in place because codeCache_init() proceeds VM_Version_init() I wanted to say that the sole purpose of `pre_initialize` is to allocate memory. Once it goes away, there's no reason to keep it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2848507499 From vlivanov at openjdk.org Sat May 3 08:28:47 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 3 May 2025 08:28:47 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 08:06:10 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 2867: >> >>> 2865: >>> 2866: uint64_t VM_Version::CpuidInfo::feature_flags() const { >>> 2867: uint64_t result = 0; >> >> It's unfortunate you migrated away from operating on a local copy. Why don't you declare a local copy (`VM_Version result`) and migrate bit manipulation to bit field accessors on it? `VM_Version::CpuidInfo::feature_flags()` can still return it by value (once you get rid of heap memory allocation, copying becomes trivial). > > New implimentation directly modify the feature vector bits though macros. I prefer explicit accessor calls on corresponding instance fields. It's confusing to see `VM_Version::CpuidInfo::feature_flags()` implicitly modifying `_dynamic_features_vector` through macros. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072349610 From jbhateja at openjdk.org Sat May 3 08:33:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 08:33:46 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 08:26:19 GMT, Vladimir Ivanov wrote: >> New implimentation directly modify the feature vector bits though macros. > > I prefer explicit accessor calls on corresponding instance fields. > > It's confusing to see `VM_Version::CpuidInfo::feature_flags()` implicitly modifying `_dynamic_features_vector` through macros. VM_Version::CpuidInfo::feature_flags() is local to x86 targets, how about changing its name to VM_Version::CpuidInfo::install_feature_flags() and use macros ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072350359 From kvn at openjdk.org Sat May 3 22:47:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 3 May 2025 22:47:55 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc Hi @marc-chevalier > doesn't propose a way to move pure calls around I agree that we should not do that in these changes. But did you consider to move/clone such call (new macro node) **down** to "users" in case the result is not used on some paths? They will be executed only where they are needed. And I think it is safe since current control dominates paths where the result is used. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2848841268 From jbhateja at openjdk.org Mon May 5 03:57:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 May 2025 03:57:22 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v11] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Reveiw comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/051c416c..b314ed0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=09-10 Stats: 376 lines in 22 files changed: 25 ins; 68 del; 283 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From jbhateja at openjdk.org Mon May 5 03:57:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 May 2025 03:57:22 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v11] In-Reply-To: References: Message-ID: <9d9DVuqRAeb_8kiEwkPQH6g2eBU5Jc_5ZSBAi1in9X0=.1d955598-f466-46ff-8b1f-71c87abd6313@github.com> On Sat, 3 May 2025 08:26:19 GMT, Vladimir Ivanov wrote: >> New implimentation directly modify the feature vector bits though macros. > > I prefer explicit accessor calls on corresponding instance fields. > > It's confusing to see `VM_Version::CpuidInfo::feature_flags()` implicitly modifying `_dynamic_features_vector` through macros. I have changed this local rountine name to install_feature_flags to confirm to its semantics ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072818174 From jbhateja at openjdk.org Mon May 5 04:06:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 May 2025 04:06:02 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v12] In-Reply-To: References: Message-ID: <2ioSQVtfXhnqvAXqiadwR1HuJsz3t9nytY0wRps-x68=.35220ade-0e70-41c6-9ebd-a271e7dcb2bb@github.com> > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: - Updating comment - Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/b314ed0e..7b414b8c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=10-11 Stats: 13 lines in 4 files changed: 0 ins; 8 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From mchevalier at openjdk.org Mon May 5 06:44:44 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 5 May 2025 06:44:44 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc I've considered it, but rather for a follow-up. My thought was to first introduce the node types, removal mechanics and such, but keep it pined by control and not touch that in this change. In the follow-up, I was hoping I would have "just" the control-pinning problem to address. Moving the calls down may be beneficial in case the result is not used in a branch (and then we save the call when executing the branch not using it), but if the usage is in a loop, we rather want the call to stay (or be hoisted) before the loop. The heuristic "out of as many loops as possible, and the later possible" seems to also apply here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2850052986 From rkennke at openjdk.org Mon May 5 13:43:23 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 5 May 2025 13:43:23 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI Message-ID: In order to support Shenandoah GC in Graal, some changes are required in JVMCI, namely, export Shenandoah relevant symbols. Testing: - [x] extensive testing with https://github.com/oracle/graal/pull/10904 ------------- Commit messages: - Fix ordering of includes - Remove unnecessary stuff - Revert unrelated changes - Revert unrelated changes - Merge branch 'master' into graal-shenandoah-support - Support for Shenandoah card-table barriers in JVMCI - Revert "8321373: Build should use LC_ALL=C.UTF-8" - Graal Shenandoah support Changes: https://git.openjdk.org/jdk/pull/25001/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25001&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356075 Stats: 59 lines in 6 files changed: 58 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25001.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25001/head:pull/25001 PR: https://git.openjdk.org/jdk/pull/25001 From dnsimon at openjdk.org Mon May 5 13:53:50 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 5 May 2025 13:53:50 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI In-Reply-To: References: Message-ID: On Fri, 2 May 2025 10:35:03 GMT, Roman Kennke wrote: > In order to support Shenandoah GC in Graal, some changes are required in JVMCI, namely, export Shenandoah relevant symbols. > > Testing: > - [x] extensive testing with https://github.com/oracle/graal/pull/10904 LGTM ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25001#pullrequestreview-2814890860 From shade at openjdk.org Mon May 5 15:38:46 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 15:38:46 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI In-Reply-To: References: Message-ID: <26EKhVnyWuLQxzRjvvLzzLcY2iW6fmgqs7qHWOdZQvA=.99efcb44-5788-403a-8ad1-83766184aa17@github.com> On Fri, 2 May 2025 10:35:03 GMT, Roman Kennke wrote: > In order to support Shenandoah GC in Graal, some changes are required in JVMCI, namely, export Shenandoah relevant symbols. > > Testing: > - [x] extensive testing with https://github.com/oracle/graal/pull/10904 A few questions: src/hotspot/share/gc/shenandoah/shenandoahRuntime.hpp line 42: > 40: static void pre_barrier(JavaThread* thread, oopDesc* orig) { > 41: write_ref_field_pre(orig, thread); > 42: } So, why not export `write_ref_field_pre`, instead of introducing this new method? Style/cleanliness, or something else? I am asking, because every time we add a new stub here, we would need to record it in `AOTCache` tables for Leyden benefit. src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp line 240: > 238: cardtable_shift = CardTable::card_shift(); > 239: } else if (bs->is_a(BarrierSet::ShenandoahBarrierSet)) { > 240: cardtable_shift = CardTable::card_shift(); I understand the barrier code does not use `cardtable_start_address`, but should we still initialize it here to `nullptr`? ------------- PR Review: https://git.openjdk.org/jdk/pull/25001#pullrequestreview-2815217376 PR Review Comment: https://git.openjdk.org/jdk/pull/25001#discussion_r2073674847 PR Review Comment: https://git.openjdk.org/jdk/pull/25001#discussion_r2073678010 From rkennke at openjdk.org Mon May 5 15:54:29 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 5 May 2025 15:54:29 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI [v2] In-Reply-To: References: Message-ID: > In order to support Shenandoah GC in Graal, some changes are required in JVMCI, namely, export Shenandoah relevant symbols. > > Testing: > - [x] extensive testing with https://github.com/oracle/graal/pull/10904 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Initialize cardtable_start_address to nullptr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25001/files - new: https://git.openjdk.org/jdk/pull/25001/files/6487a9f7..c95313a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25001&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25001&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25001.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25001/head:pull/25001 PR: https://git.openjdk.org/jdk/pull/25001 From rkennke at openjdk.org Mon May 5 15:54:29 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 5 May 2025 15:54:29 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI [v2] In-Reply-To: <26EKhVnyWuLQxzRjvvLzzLcY2iW6fmgqs7qHWOdZQvA=.99efcb44-5788-403a-8ad1-83766184aa17@github.com> References: <26EKhVnyWuLQxzRjvvLzzLcY2iW6fmgqs7qHWOdZQvA=.99efcb44-5788-403a-8ad1-83766184aa17@github.com> Message-ID: On Mon, 5 May 2025 15:31:59 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Initialize cardtable_start_address to nullptr > > src/hotspot/share/gc/shenandoah/shenandoahRuntime.hpp line 42: > >> 40: static void pre_barrier(JavaThread* thread, oopDesc* orig) { >> 41: write_ref_field_pre(orig, thread); >> 42: } > > So, why not export `write_ref_field_pre`, instead of introducing this new method? Style/cleanliness, or something else? I am asking, because every time we add a new stub here, we would need to record it in `AOTCache` tables for Leyden benefit. It's about the argument ordering. Graal expects the Thread* to be prependend, while other JITs call it with the Thread* appended. I guess we could change other JIT calls to also prepend the thread, or change the interface to not pass the Thread* at all. I chose to follow G1 and export both variants. > src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp line 240: > >> 238: cardtable_shift = CardTable::card_shift(); >> 239: } else if (bs->is_a(BarrierSet::ShenandoahBarrierSet)) { >> 240: cardtable_shift = CardTable::card_shift(); > > I understand the barrier code does not use `cardtable_start_address`, but should we still initialize it here to `nullptr`? Good point, did that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25001#discussion_r2073702873 PR Review Comment: https://git.openjdk.org/jdk/pull/25001#discussion_r2073705091 From cslucas at openjdk.org Mon May 5 16:26:49 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 5 May 2025 16:26:49 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 15:54:29 GMT, Roman Kennke wrote: >> In order to support Shenandoah GC in Graal, some changes are required in JVMCI, namely, export Shenandoah relevant symbols. >> >> Testing: >> - [x] extensive testing with https://github.com/oracle/graal/pull/10904 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Initialize cardtable_start_address to nullptr LGTM. Thanks. ------------- Marked as reviewed by cslucas (Committer). PR Review: https://git.openjdk.org/jdk/pull/25001#pullrequestreview-2815365611 From shade at openjdk.org Mon May 5 16:50:46 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 16:50:46 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI [v2] In-Reply-To: References: <26EKhVnyWuLQxzRjvvLzzLcY2iW6fmgqs7qHWOdZQvA=.99efcb44-5788-403a-8ad1-83766184aa17@github.com> Message-ID: <92qsu_6Qj0xyyGYK8dDm97enzXvKNgj7obDEHnhVcds=.d4671e44-1ef0-4c79-a286-fa578a6eb25f@github.com> On Mon, 5 May 2025 15:49:32 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shenandoah/shenandoahRuntime.hpp line 42: >> >>> 40: static void pre_barrier(JavaThread* thread, oopDesc* orig) { >>> 41: write_ref_field_pre(orig, thread); >>> 42: } >> >> So, why not export `write_ref_field_pre`, instead of introducing this new method? Style/cleanliness, or something else? I am asking, because every time we add a new stub here, we would need to record it in `AOTCache` tables for Leyden benefit. > > It's about the argument ordering. Graal expects the Thread* to be prependend, while other JITs call it with the Thread* appended. I guess we could change other JIT calls to also prepend the thread, or change the interface to not pass the Thread* at all. I chose to follow G1 and export both variants. Oh, so this matches `JVMCIRuntime::write_barrier_pre` for G1 (weird place to have it, but oh well). Does Graal need the `Thread*` argument? I think this method is only called when SATB buffer is full. So the performance of this method is likely not affected by getting the current thread down in caller. So I think it would be more straight-forward to sharpen `ShenandoahRuntime::write_ref_field_pre` by dropping `Thread*` and then exporting that. Maybe also under the `SR::write_barrier_pre` name to be even more consistent for everything else. Maybe @JohnTortugo wants to clean up more mess in C2 related to this :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25001#discussion_r2073800305 From shade at openjdk.org Mon May 5 16:50:48 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 16:50:48 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 15:54:29 GMT, Roman Kennke wrote: >> In order to support Shenandoah GC in Graal, some changes are required in JVMCI, namely, export Shenandoah relevant symbols. >> >> Testing: >> - [x] extensive testing with https://github.com/oracle/graal/pull/10904 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Initialize cardtable_start_address to nullptr src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 137: > 135: ZGC_ONLY(static_field(CompilerToVM::Data, sizeof_ZStoreBarrierEntry, int)) \ > 136: SHENANDOAHGC_ONLY(static_field(CompilerToVM::Data, shenandoah_in_cset_fast_test_addr, address)) \ > 137: SHENANDOAHGC_ONLY(static_field(CompilerToVM::Data, shenandoah_region_size_bytes_shift,int)) \ Also indent trailing backslashes. src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 909: > 907: SHENANDOAHGC_ONLY(declare_function(ShenandoahRuntime::load_reference_barrier_weak_narrow)) \ > 908: SHENANDOAHGC_ONLY(declare_function(ShenandoahRuntime::load_reference_barrier_phantom)) \ > 909: SHENANDOAHGC_ONLY(declare_function(ShenandoahRuntime::load_reference_barrier_phantom_narrow)) \ Also indent trailing backslashes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25001#discussion_r2073801311 PR Review Comment: https://git.openjdk.org/jdk/pull/25001#discussion_r2073801126 From rkennke at openjdk.org Mon May 5 16:58:01 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 5 May 2025 16:58:01 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI [v3] In-Reply-To: References: Message-ID: > In order to support Shenandoah GC in Graal, some changes are required in JVMCI, namely, export Shenandoah relevant symbols. > > Testing: > - [x] extensive testing with https://github.com/oracle/graal/pull/10904 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Align backslashes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25001/files - new: https://git.openjdk.org/jdk/pull/25001/files/c95313a9..44344585 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25001&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25001&range=01-02 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25001.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25001/head:pull/25001 PR: https://git.openjdk.org/jdk/pull/25001 From rkennke at openjdk.org Mon May 5 16:58:01 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 5 May 2025 16:58:01 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI [v3] In-Reply-To: <92qsu_6Qj0xyyGYK8dDm97enzXvKNgj7obDEHnhVcds=.d4671e44-1ef0-4c79-a286-fa578a6eb25f@github.com> References: <26EKhVnyWuLQxzRjvvLzzLcY2iW6fmgqs7qHWOdZQvA=.99efcb44-5788-403a-8ad1-83766184aa17@github.com> <92qsu_6Qj0xyyGYK8dDm97enzXvKNgj7obDEHnhVcds=.d4671e44-1ef0-4c79-a286-fa578a6eb25f@github.com> Message-ID: On Mon, 5 May 2025 16:46:46 GMT, Aleksey Shipilev wrote: >> It's about the argument ordering. Graal expects the Thread* to be prependend, while other JITs call it with the Thread* appended. I guess we could change other JIT calls to also prepend the thread, or change the interface to not pass the Thread* at all. I chose to follow G1 and export both variants. > > Oh, so this matches `JVMCIRuntime::write_barrier_pre` for G1 (weird place to have it, but oh well). > > Does Graal need the `Thread*` argument? > > I think this method is only called when SATB buffer is full. So the performance of this method is likely not affected by getting the current thread down in caller. So I think it would be more straight-forward to sharpen `ShenandoahRuntime::write_ref_field_pre` by dropping `Thread*` and then exporting that. Maybe also under the `SR::write_barrier_pre` name to be even more consistent for everything else. > > Maybe @JohnTortugo wants to clean up more mess in C2 related to this :) Graal does not need the Thread* argument, but the runtime code behind write_ref_pre() currently uses it. I agree, it does not look performance critical to pass it through. However, getting rid of it seems to blow the scope of this PR. I'd rather do this as a follow-up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25001#discussion_r2073807949 From rkennke at openjdk.org Mon May 5 16:58:02 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 5 May 2025 16:58:02 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI [v3] In-Reply-To: References: <26EKhVnyWuLQxzRjvvLzzLcY2iW6fmgqs7qHWOdZQvA=.99efcb44-5788-403a-8ad1-83766184aa17@github.com> <92qsu_6Qj0xyyGYK8dDm97enzXvKNgj7obDEHnhVcds=.d4671e44-1ef0-4c79-a286-fa578a6eb25f@github.com> Message-ID: <_4ebbEF2UTOwozaLgA8ibDcCi1YF5I4rKA1NN3tBozs=.7ce738b0-7e79-4c93-bc4d-63f534ccc536@github.com> On Mon, 5 May 2025 16:51:39 GMT, Roman Kennke wrote: >> Oh, so this matches `JVMCIRuntime::write_barrier_pre` for G1 (weird place to have it, but oh well). >> >> Does Graal need the `Thread*` argument? >> >> I think this method is only called when SATB buffer is full. So the performance of this method is likely not affected by getting the current thread down in caller. So I think it would be more straight-forward to sharpen `ShenandoahRuntime::write_ref_field_pre` by dropping `Thread*` and then exporting that. Maybe also under the `SR::write_barrier_pre` name to be even more consistent for everything else. >> >> Maybe @JohnTortugo wants to clean up more mess in C2 related to this :) > > Graal does not need the Thread* argument, but the runtime code behind write_ref_pre() currently uses it. I agree, it does not look performance critical to pass it through. However, getting rid of it seems to blow the scope of this PR. I'd rather do this as a follow-up. Actually, I'd probably add the new entry for Graal without the Thread* argument now, and fix the others in a follow-up. Otherwise we need to deal with it on the Graal side again later once we change the entry points. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25001#discussion_r2073813072 From kvn at openjdk.org Mon May 5 17:02:44 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 17:02:44 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc Nice work. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24966#pullrequestreview-2815464620 From shade at openjdk.org Mon May 5 17:03:47 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 17:03:47 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI [v3] In-Reply-To: <_4ebbEF2UTOwozaLgA8ibDcCi1YF5I4rKA1NN3tBozs=.7ce738b0-7e79-4c93-bc4d-63f534ccc536@github.com> References: <26EKhVnyWuLQxzRjvvLzzLcY2iW6fmgqs7qHWOdZQvA=.99efcb44-5788-403a-8ad1-83766184aa17@github.com> <92qsu_6Qj0xyyGYK8dDm97enzXvKNgj7obDEHnhVcds=.d4671e44-1ef0-4c79-a286-fa578a6eb25f@github.com> <_4ebbEF2UTOwozaLgA8ibDcCi1YF5I4rKA1NN3tBozs=.7ce738b0-7e79-4c93-bc4d-63f534ccc536@github.com> Message-ID: On Mon, 5 May 2025 16:55:36 GMT, Roman Kennke wrote: >> Graal does not need the Thread* argument, but the runtime code behind write_ref_pre() currently uses it. I agree, it does not look performance critical to pass it through. However, getting rid of it seems to blow the scope of this PR. I'd rather do this as a follow-up. > > Actually, I'd probably add the new entry for Graal without the Thread* argument now, and fix the others in a follow-up. Otherwise we need to deal with it on the Graal side again later once we change the entry points. OK, but that follow-up risks changing the JVMCI interface _again_? How about we introduce: static void write_barrier_pre(oopDesc* pre_val) { write_ref_field_pre(pre_val, JavaThread::current()); } ...and then the follow-up purges the old `write_ref_field_pre`? The implementation might need to be in `.cpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25001#discussion_r2073820137 From vlivanov at openjdk.org Mon May 5 19:08:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 5 May 2025 19:08:48 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc Good work, Marc. High-level comment: I don't know what are the future plans, but as the patch stands now, it feels like it complicates both the design and the implementation. Original implementation relies on macro nodes which are later expanded into leaf runtime calls. What you propose introduce new concept of "pure calls" which is: (1) not a CallNode anymore; and (2) relies on subclassing (which makes it hard to mix with other node properties). Moreover, I don't see much benefit in committing to runtime call representation from the very beginning (early in high-level IR). Going forward, IMO the sweet sport is to support arbitrary nodes to be lowered into leaf runtime calls. You make a big step in that direction by relaxing requirements on `PureCall` to be just a CFG node (and not a full-blown `CallLeaf` node). Next step would be to relax CFG node requirement and let compiler pick the right place to insert it. (Existing expensive node support in C2 addresses some similar challenges.) And, as a complementary options, in some cases it may be just enough to mark individual call nodes as pure, so they can be pruned later if nobody consumes result of their computation anymore. ------------- PR Review: https://git.openjdk.org/jdk/pull/24966#pullrequestreview-2815810010 From rkennke at openjdk.org Mon May 5 20:25:27 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 5 May 2025 20:25:27 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI [v4] In-Reply-To: References: Message-ID: > In order to support Shenandoah GC in Graal, some changes are required in JVMCI, namely, export Shenandoah relevant symbols. > > Testing: > - [x] extensive testing with https://github.com/oracle/graal/pull/10904 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Simplify pre-barrier runtime entry ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25001/files - new: https://git.openjdk.org/jdk/pull/25001/files/44344585..41084f3e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25001&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25001&range=02-03 Stats: 8 lines in 3 files changed: 4 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25001.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25001/head:pull/25001 PR: https://git.openjdk.org/jdk/pull/25001 From vlivanov at openjdk.org Tue May 6 01:33:20 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 6 May 2025 01:33:20 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v12] In-Reply-To: <2ioSQVtfXhnqvAXqiadwR1HuJsz3t9nytY0wRps-x68=.35220ade-0e70-41c6-9ebd-a271e7dcb2bb@github.com> References: <2ioSQVtfXhnqvAXqiadwR1HuJsz3t9nytY0wRps-x68=.35220ade-0e70-41c6-9ebd-a271e7dcb2bb@github.com> Message-ID: On Mon, 5 May 2025 04:06:02 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: > > - Updating comment > - Review comments resolutions It does look much better now. Thanks! Some comments/suggestions follow. src/hotspot/cpu/x86/vm_version_x86.cpp line 853: > 851: > 852: if (cpu_family() > 4) { // it supports CPUID > 853: _features = _cpuid_info.feature_flags(); // These can be changed by VM settings You don't need to change this code if you equip `VM_Features` with a copy constructor. src/hotspot/cpu/x86/vm_version_x86.cpp line 1102: > 1100: size_t buf_iter = cpu_info_size; > 1101: for (uint64_t i = 0; i < features_vector_size(); i++) { > 1102: insert_features_names(features_vector_elem(i), buf + buf_iter, sizeof(buf) - buf_iter, _features_names, 64 * i); `Abstract_VM_Version::insert_features_names` is used only on x86. You can move it to `vm_version_x86.cpp/.hpp` and adjust to new layout. src/hotspot/cpu/x86/vm_version_x86.hpp line 707: > 705: // > 706: static bool supports_cpuid() { return _features != 0; } > 707: static bool supports_cmov() { return (_features & CPU_CMOV) != 0; } Since you touch this code anyway, I suggest to use this opportunity to automatically derive this code using `CPU_FEATURE_FLAGS` macro. (As an example [1].) [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp#L147 src/hotspot/cpu/x86/vm_version_x86.hpp line 753: > 751: // Feature identification which can be affected by VM settings > 752: // > 753: static bool supports_cpuid() { return Abstract_VM_Version::vm_features_exist(); } Is `VM_Features::_features_vector_size > 0` equivalent to `_features != 0`? I believe you can simply drop `supports_cpuid()`. x86-32 bit port is gone and even there `cpuid` support was mandatory. src/hotspot/share/runtime/abstract_vm_version.hpp line 51: > 49: class VM_Features { > 50: public: > 51: using FeatureVector = uint64_t [MAX_FEATURE_VEC_SIZE]; Why did you decide to declare new type name for fixed size array type? I see you use `FeatureVector` in `vmStructs*` and JVMCI code. Does it make things simpler there? src/hotspot/share/runtime/abstract_vm_version.hpp line 91: > 89: > 90: // CPU feature flags vector, can be affected by VM settings. > 91: static VM_Features _vm_target_features; Unless we plan to migrate all platforms all at once, I suggest to move this code into `VM_Version` and keep the same names (`_features` and `_cpu_features`). Ideally, `_features` field can be moved to from `Abstract_VM_Version` to platform-specific `VM_Version`s across all platforms. But leaving it as is for now is also fine with me. There's a precedent: `VM_Version` already overrides `_features` field on s390 [1]. `VM_Features` class can start as x86-specific, but for advertisement purposes it makes sense to keep it in `abstract_vm_version.hpp`. Alternatively, `Abstract_VM_Version::_features` can be converted from `uint64_t` to `VM_Features` and non-x86 platforms can be covered by providing overloads for currently used operators (it's mostly `|=`, `&=`, and `&`, plus convertions). [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/s390/vm_version_s390.hpp#L130 src/hotspot/share/runtime/abstract_vm_version.hpp line 97: > 95: > 96: static void sync_cpu_features() { > 97: memcpy(_cpu_target_features._features_vector, _vm_target_features._features_vector, Any particular reason to use `memcpy`/`memset` and not a loop over `_features_vector` array? I believe once you define default and copy constructors for `VM_Features`, `sync_cpu_features()` and `clear_cpu_features()` won't be needed anymore. src/hotspot/share/runtime/abstract_vm_version.hpp line 183: > 181: static const char* printable_jdk_debug_level(); > 182: > 183: static uint64_t features() { Not used. Drop it. src/hotspot/share/runtime/init.cpp line 68: > 66: void codeCache_init(); > 67: void VM_Version_init(); > 68: void VM_Version_pre_init(); Redundant declaration. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/amd64/AMD64HotSpotVMConfig.java line 94: > 92: final long amd64CET_IBT = getConstant("VM_Version::CPU_CET_IBT", Long.class); > 93: final long amd64CET_SS = getConstant("VM_Version::CPU_CET_SS", Long.class); > 94: final long avx10_1 = getConstant("VM_Version::CPU_AVX10_1", Long.class); Leave them as is. @mur47x111 plans to remove them [1]. [1] https://github.com/openjdk/jdk/pull/24329#issuecomment-2838223030 ------------- PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2815634822 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074470895 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074469800 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074484317 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074481382 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074502713 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074479165 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074496719 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074480203 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2073919224 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074519224 From vlivanov at openjdk.org Tue May 6 01:33:21 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 6 May 2025 01:33:21 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v12] In-Reply-To: <9d9DVuqRAeb_8kiEwkPQH6g2eBU5Jc_5ZSBAi1in9X0=.1d955598-f466-46ff-8b1f-71c87abd6313@github.com> References: <9d9DVuqRAeb_8kiEwkPQH6g2eBU5Jc_5ZSBAi1in9X0=.1d955598-f466-46ff-8b1f-71c87abd6313@github.com> Message-ID: On Mon, 5 May 2025 03:54:24 GMT, Jatin Bhateja wrote: >> I prefer explicit accessor calls on corresponding instance fields. >> >> It's confusing to see `VM_Version::CpuidInfo::feature_flags()` implicitly modifying `_dynamic_features_vector` through macros. > > I have changed this local rountine name to install_feature_flags to confirm to its semantics It's still counter-intuitive to see `VM_Version::CpuidInfo` implicitly initializes a field in `Abstract_VM_Version` class. I prefer original code shape. Any problems with the following code shape? VM_Features VM_Version::CpuidInfo::feature_flags() const { VM_Features result; if (std_cpuid1_edx.bits.cmpxchg8 != 0) { result.set_feature(CPU_CX8); } ... return result; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074474099 From mchevalier at openjdk.org Tue May 6 07:46:14 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 6 May 2025 07:46:14 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc Thanks for the comment. I'll think deeper about it. I've started by trying to make PureCall a subclass of Call (or a property of LeafCall) but that broke a lot of things that were using some invariants on CallNode that weren't holding anymore. After a some time tracking bugs and trying to fix, I thought it would be simpler to have a new kind of node, and it would have less impact on existing code. Another reason I've changed it to a direct sub-class of Node is that I felt it made little sense to be a Call (or sub-class of) since Calls are Safepoint, but pure calls don't need to be (and similar "conceptual" problems). It seemed like a hack to me. About > support arbitrary nodes to be lowered into leaf runtime calls. I don't think I understand what you mean. Overall, I see the weaknesses of my design, but I'm not sure which direction to take instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2853576338 From shade at openjdk.org Tue May 6 08:15:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 6 May 2025 08:15:16 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI [v4] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 20:25:27 GMT, Roman Kennke wrote: >> In order to support Shenandoah GC in Graal, some changes are required in JVMCI, namely, export Shenandoah relevant symbols. >> >> Testing: >> - [x] extensive testing with https://github.com/oracle/graal/pull/10904 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Simplify pre-barrier runtime entry All right, this works, thanks! ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25001#pullrequestreview-2817308091 From jbhateja at openjdk.org Tue May 6 08:49:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 08:49:57 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v13] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/7b414b8c..b25cc776 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=11-12 Stats: 441 lines in 9 files changed: 106 ins; 107 del; 228 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From jbhateja at openjdk.org Tue May 6 08:49:58 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 08:49:58 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v12] In-Reply-To: References: <2ioSQVtfXhnqvAXqiadwR1HuJsz3t9nytY0wRps-x68=.35220ade-0e70-41c6-9ebd-a271e7dcb2bb@github.com> Message-ID: On Tue, 6 May 2025 00:30:23 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: >> >> - Updating comment >> - Review comments resolutions > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1102: > >> 1100: size_t buf_iter = cpu_info_size; >> 1101: for (uint64_t i = 0; i < features_vector_size(); i++) { >> 1102: insert_features_names(features_vector_elem(i), buf + buf_iter, sizeof(buf) - buf_iter, _features_names, 64 * i); > > `Abstract_VM_Version::insert_features_names` is used only on x86. You can move it to `vm_version_x86.cpp/.hpp` and adjust to new layout. DONE > src/hotspot/share/runtime/abstract_vm_version.hpp line 51: > >> 49: class VM_Features { >> 50: public: >> 51: using FeatureVector = uint64_t [MAX_FEATURE_VEC_SIZE]; > > Why did you decide to declare new type name for fixed size array type? I see you use `FeatureVector` in `vmStructs*` and JVMCI code. Does it make things simpler there? Yes. I was facing compilation issues with raw array types. > src/hotspot/share/runtime/abstract_vm_version.hpp line 91: > >> 89: >> 90: // CPU feature flags vector, can be affected by VM settings. >> 91: static VM_Features _vm_target_features; > > Unless we plan to migrate all platforms all at once, I suggest to move this code into `VM_Version` and keep the same names (`_features` and `_cpu_features`). Ideally, `_features` field can be moved to from `Abstract_VM_Version` to platform-specific `VM_Version`s across all platforms. But leaving it as is for now is also fine with me. > > There's a precedent: `VM_Version` already overrides `_features` field on s390 [1]. > > `VM_Features` class can start as x86-specific, but for advertisement purposes it makes sense to keep it in `abstract_vm_version.hpp`. > > Alternatively, `Abstract_VM_Version::_features` can be converted from `uint64_t` to `VM_Features` and non-x86 platforms can be covered by providing overloads for currently used operators (it's mostly `|=`, `&=`, and `&`, plus convertions). > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/s390/vm_version_s390.hpp#L130 Moved VM_Features to VM_Version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075014479 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075012045 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075015126 From jbhateja at openjdk.org Tue May 6 08:49:58 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 08:49:58 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v13] In-Reply-To: References: <2ioSQVtfXhnqvAXqiadwR1HuJsz3t9nytY0wRps-x68=.35220ade-0e70-41c6-9ebd-a271e7dcb2bb@github.com> Message-ID: On Tue, 6 May 2025 00:57:29 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > src/hotspot/cpu/x86/vm_version_x86.hpp line 707: > >> 705: // >> 706: static bool supports_cpuid() { return _features != 0; } >> 707: static bool supports_cmov() { return (_features & CPU_CMOV) != 0; } > > Since you touch this code anyway, I suggest to use this opportunity to automatically derive this code using `CPU_FEATURE_FLAGS` macro. (As an example [1].) > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp#L147 Unlike AARCH64, there is not a 1:1 mapping b/w CPU_* features and the corresponding support checkers; some AVX512 checkers use multiple features. Skipping this for now for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075012161 From jbhateja at openjdk.org Tue May 6 08:57:18 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 08:57:18 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: <3t1R35B9bafRtfvqfE7D2dAeLrjaDukXlDUGb-3VtaA=.46d64318-e9fb-4bf3-8a68-8dba2c2b7b26@github.com> References: <3t1R35B9bafRtfvqfE7D2dAeLrjaDukXlDUGb-3VtaA=.46d64318-e9fb-4bf3-8a68-8dba2c2b7b26@github.com> Message-ID: On Sat, 3 May 2025 08:13:11 GMT, Vladimir Ivanov wrote: >>> Ok, thanks! I wasn't sure you finished the pass. >>> >>> I'm still seeing dynamic memory allocation which IMO unnecessarily complicates the implementation. Bitmap size is fixed and well-known at compile time. It enables `VM_Feature` class to embed the array of proper size inline. And it eliminates all the problems related to undesired sharing of backed array. (Also, `pre_initialize()` is not needed as well.) >> >> Bitmap size depends on the maximum feature enum value, I made it dynamic to keep it flexible. Do you want the feature vector size to be made constant and manually bump it when we exhaust the limit? > >> Bitmap size depends on the maximum feature enum value, I made it dynamic to keep it flexible. Do you want the feature vector size to be made constant and manually bump it when we exhaust the limit? > > Yes, please. (The limit may be precise - number of elements in Feature_Flag enum - but the logic which computes the size of backing array can automatically round it and bump the size once the actual limit is reached.) > >> pre_initialize was put in place because codeCache_init() proceeds VM_Version_init() > > I wanted to say that the sole purpose of `pre_initialize` is to allocate memory. Once it goes away, there's no reason to keep it. Hi @iwanowww , your comments have been addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2853772762 From shade at openjdk.org Tue May 6 09:57:47 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 6 May 2025 09:57:47 GMT Subject: RFR: 8356259: Lift basic -Xlog:jit* logging to "info" level Message-ID: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> We have unified logging for JIT activity: -Xlog:jit+compilation, -Xlog:jit+inlining, etc. These serve as convenient replacements for -XX:+PrintCompilation, -XX:+PrintInlining, etc. And these replacements are useful, because UL can be forwarded to file, their format can be adjusted, and they can be handled asynchronously. However, all useful messages are on "debug" level, which is inconvenient and surprising. It is reasonable to expect some level of basic logging when supplying -Xlog:jit+compilation, e.g. "info" level. I believe we should lift at least some of the logging to "info" level for these. Additional testing: - [x] Eyeballing `-Xlog:jit*` logs after the patch - [ ] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/25061/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25061&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356259 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25061/head:pull/25061 PR: https://git.openjdk.org/jdk/pull/25061 From rkennke at openjdk.org Tue May 6 11:11:19 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 6 May 2025 11:11:19 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI [v4] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 20:25:27 GMT, Roman Kennke wrote: >> In order to support Shenandoah GC in Graal, some changes are required in JVMCI, namely, export Shenandoah relevant symbols. >> >> Testing: >> - [x] extensive testing with https://github.com/oracle/graal/pull/10904 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Simplify pre-barrier runtime entry Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25001#issuecomment-2854170217 From rkennke at openjdk.org Tue May 6 11:11:19 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 6 May 2025 11:11:19 GMT Subject: Integrated: 8356075: Support Shenandoah GC in JVMCI In-Reply-To: References: Message-ID: On Fri, 2 May 2025 10:35:03 GMT, Roman Kennke wrote: > In order to support Shenandoah GC in Graal, some changes are required in JVMCI, namely, export Shenandoah relevant symbols. > > Testing: > - [x] extensive testing with https://github.com/oracle/graal/pull/10904 This pull request has now been integrated. Changeset: 614ba9fc Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/614ba9fc41a0274a31f0e8eff8a598a7c5afe164 Stats: 62 lines in 7 files changed: 61 ins; 0 del; 1 mod 8356075: Support Shenandoah GC in JVMCI Reviewed-by: shade, dnsimon, cslucas ------------- PR: https://git.openjdk.org/jdk/pull/25001 From jbhateja at openjdk.org Tue May 6 11:19:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 11:19:54 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v14] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: build fixes for non-x86 targets ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/b25cc776..650e3d61 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=12-13 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From qamai at openjdk.org Tue May 6 11:50:19 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 6 May 2025 11:50:19 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v14] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 11:19:54 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > build fixes for non-x86 targets src/hotspot/cpu/x86/vm_version_x86.hpp line 37: > 35: class VM_Features { > 36: public: > 37: using FeatureVector = uint64_t [MAX_FEATURE_VEC_SIZE]; Do you think it would be better to refactor this into a separate class analogous to `std::bitset`? You can start with only implementing `test`, `set`, `reset`. This would help in other use cases, too. https://en.cppreference.com/w/cpp/utility/bitset ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075295556 From qamai at openjdk.org Tue May 6 11:54:22 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 6 May 2025 11:54:22 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v14] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 11:19:54 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > build fixes for non-x86 targets src/hotspot/cpu/x86/vm_version_x86.hpp line 44: > 42: // log2 of feature vector element size in bits, used by JVMCI to check enabled feature bits. > 43: // Refer HotSpotJVMCIBackendFactory::convertFeaturesVector. > 44: static uint32_t _features_vector_element_shift_count; Making this `static constexpr` helps constant folding, too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075301116 From dnsimon at openjdk.org Tue May 6 11:56:23 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 6 May 2025 11:56:23 GMT Subject: RFR: 8356075: Support Shenandoah GC in JVMCI [v4] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 20:25:27 GMT, Roman Kennke wrote: >> In order to support Shenandoah GC in Graal, some changes are required in JVMCI, namely, export Shenandoah relevant symbols. >> >> Testing: >> - [x] extensive testing with https://github.com/oracle/graal/pull/10904 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Simplify pre-barrier runtime entry src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp line 239: > 237: cardtable_start_address = base; > 238: cardtable_shift = CardTable::card_shift(); > 239: } else if (bs->is_a(BarrierSet::ShenandoahBarrierSet)) { This change is causing a failure in mach5 tier 1: [2025-05-06T11:34:44,742Z] /workspace/open/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp:239:35: error: no member named 'ShenandoahBarrierSet' in 'BarrierSet' [2025-05-06T11:34:44,742Z] } else if (bs->is_a(BarrierSet::ShenandoahBarrierSet)) { [2025-05-06T11:34:44,742Z] ~~~~~~~~~~~~^ [2025-05-06T11:34:45,729Z] 1 error generated. I assume it's missing `#if INCLUDE_SHENANDOAHGC`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25001#discussion_r2075304100 From jbhateja at openjdk.org Tue May 6 12:09:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 12:09:21 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v14] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 11:47:47 GMT, Quan Anh Mai wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> build fixes for non-x86 targets > > src/hotspot/cpu/x86/vm_version_x86.hpp line 37: > >> 35: class VM_Features { >> 36: public: >> 37: using FeatureVector = uint64_t [MAX_FEATURE_VEC_SIZE]; > > Do you think it would be better to refactor this into a separate class analogous to `std::bitset`? You can start with only implementing `test`, `set`, `reset`. This would help in other use cases, too. > > https://en.cppreference.com/w/cpp/utility/bitset In essence, what we have currently is a bitmap implementation, but its utility is limited to VM_Version for now. The current approach simplifies the JVMCI side of handling. We have an existing utility for bitset src/hotspot/share/utilities/bitMap.hpp ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075325468 From rkennke at openjdk.org Tue May 6 12:22:53 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 6 May 2025 12:22:53 GMT Subject: RFR: 8356266: Fix non-Shenandoah build after JDK-8356075 Message-ID: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> [JDK-8356075](https://bugs.openjdk.org/browse/JDK-8356075) (see PR #25001) causes builds without Shenandoah GC to fail. It's missing an `#if INCLUDE_SHENANDOAHGC`. Testing: - [x] Build without Shenandoah GC ------------- Commit messages: - 8356266: Fix non-Shenandoah build after JDK-8356075 Changes: https://git.openjdk.org/jdk/pull/25064/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25064&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356266 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25064/head:pull/25064 PR: https://git.openjdk.org/jdk/pull/25064 From dnsimon at openjdk.org Tue May 6 12:46:16 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 6 May 2025 12:46:16 GMT Subject: RFR: 8356266: Fix non-Shenandoah build after JDK-8356075 In-Reply-To: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> References: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> Message-ID: <108F8BKi1AuttNCA6a1RxJYTIVnP0phMzeaUNoHMq9Q=.44c15e9a-a8d7-42c7-97c2-f1eb0b6b5e04@github.com> On Tue, 6 May 2025 12:17:44 GMT, Roman Kennke wrote: > [JDK-8356075](https://bugs.openjdk.org/browse/JDK-8356075) (see PR #25001) causes builds without Shenandoah GC to fail. It's missing an `#if INCLUDE_SHENANDOAHGC`. > > Testing: > - [x] Build without Shenandoah GC Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25064#pullrequestreview-2818123867 From rkennke at openjdk.org Tue May 6 13:18:23 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 6 May 2025 13:18:23 GMT Subject: RFR: 8356266: Fix non-Shenandoah build after JDK-8356075 In-Reply-To: <108F8BKi1AuttNCA6a1RxJYTIVnP0phMzeaUNoHMq9Q=.44c15e9a-a8d7-42c7-97c2-f1eb0b6b5e04@github.com> References: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> <108F8BKi1AuttNCA6a1RxJYTIVnP0phMzeaUNoHMq9Q=.44c15e9a-a8d7-42c7-97c2-f1eb0b6b5e04@github.com> Message-ID: <_dnHV1rf65FfgcxrigE2RMCBOBu_YUq58SAdmB2as2k=.605e1266-5e2f-44da-8889-3658545d6c1b@github.com> On Tue, 6 May 2025 12:43:07 GMT, Doug Simon wrote: >> [JDK-8356075](https://bugs.openjdk.org/browse/JDK-8356075) (see PR #25001) causes builds without Shenandoah GC to fail. It's missing an `#if INCLUDE_SHENANDOAHGC`. >> >> Testing: >> - [x] Build without Shenandoah GC > > Marked as reviewed by dnsimon (Reviewer). Thanks, @dougxc! Is this trivial? Can I push this right away to fix the build? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25064#issuecomment-2854547042 From shade at openjdk.org Tue May 6 13:28:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 6 May 2025 13:28:25 GMT Subject: RFR: 8356266: Fix non-Shenandoah build after JDK-8356075 In-Reply-To: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> References: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> Message-ID: On Tue, 6 May 2025 12:17:44 GMT, Roman Kennke wrote: > [JDK-8356075](https://bugs.openjdk.org/browse/JDK-8356075) (see PR #25001) causes builds without Shenandoah GC to fail. It's missing an `#if INCLUDE_SHENANDOAHGC`. > > Testing: > - [x] Build without Shenandoah GC Ah yes. Trivial. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25064#pullrequestreview-2818264886 From rkennke at openjdk.org Tue May 6 13:28:26 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 6 May 2025 13:28:26 GMT Subject: RFR: 8356266: Fix non-Shenandoah build after JDK-8356075 In-Reply-To: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> References: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> Message-ID: On Tue, 6 May 2025 12:17:44 GMT, Roman Kennke wrote: > [JDK-8356075](https://bugs.openjdk.org/browse/JDK-8356075) (see PR #25001) causes builds without Shenandoah GC to fail. It's missing an `#if INCLUDE_SHENANDOAHGC`. > > Testing: > - [x] Build without Shenandoah GC Some GHA failures - they look unrelated. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25064#issuecomment-2854558817 PR Comment: https://git.openjdk.org/jdk/pull/25064#issuecomment-2854572905 From rkennke at openjdk.org Tue May 6 13:28:26 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 6 May 2025 13:28:26 GMT Subject: Integrated: 8356266: Fix non-Shenandoah build after JDK-8356075 In-Reply-To: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> References: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> Message-ID: <5xNQWiQmV33cfOTCB2_pb5B66d7L7IK2MXWEN-Gnqy4=.181a933e-4a2a-4210-8610-f03d62828c8c@github.com> On Tue, 6 May 2025 12:17:44 GMT, Roman Kennke wrote: > [JDK-8356075](https://bugs.openjdk.org/browse/JDK-8356075) (see PR #25001) causes builds without Shenandoah GC to fail. It's missing an `#if INCLUDE_SHENANDOAHGC`. > > Testing: > - [x] Build without Shenandoah GC This pull request has now been integrated. Changeset: bfdafb76 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/bfdafb762661fad5746607aaf5b21d6d11c72ffc Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8356266: Fix non-Shenandoah build after JDK-8356075 Reviewed-by: dnsimon, shade ------------- PR: https://git.openjdk.org/jdk/pull/25064 From vlivanov at openjdk.org Tue May 6 18:21:13 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 6 May 2025 18:21:13 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Tue, 6 May 2025 07:43:57 GMT, Marc Chevalier wrote: >> support arbitrary nodes to be lowered into leaf runtime calls. A leaf runtime call which doesn't depend or change memory state can be inserted at arbitrary points in the graph. So, an arbitrary data node can be lowered into a runtime call once the place to insert it is known/chosen. > Overall, I see the weaknesses of my design, but I'm not sure which direction to take instead. I suggest to experiment with untangling `ModF`/`ModD` from `CallLeaf`, making them expensive nodes (to avoid commoning during GVN) , and still lower them into `CallLeaf`. (It doesn't have to be part of existing macro expansion. Depending on implementation considerations, earlier or later may be more appropriate. But it should be expanded before RA kicks in.) The hard part is probably related to picking a point in CFG to insert the call, but the control the node has may be not suitable for that (e.g., if inputs don't dominate control anymore). In that case, updating control input during loop opts may be an option. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2855510094 From kvn at openjdk.org Tue May 6 18:42:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 May 2025 18:42:13 GMT Subject: RFR: 8356259: Lift basic -Xlog:jit* logging to "info" level In-Reply-To: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> References: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> Message-ID: <6hO6sv_xTTfD8CuETfeCFvN0oURmjfX9PDIwzd4EnG4=.35d3674c-6e9e-444f-af5f-bc47586530b9@github.com> On Tue, 6 May 2025 09:52:24 GMT, Aleksey Shipilev wrote: > We have unified logging for JIT activity: -Xlog:jit+compilation, -Xlog:jit+inlining, etc. These serve as convenient replacements for -XX:+PrintCompilation, -XX:+PrintInlining, etc. And these replacements are useful, because UL can be forwarded to file, their format can be adjusted, and they can be handled asynchronously. > > However, all useful messages are on "debug" level, which is inconvenient and surprising. It is reasonable to expect some level of basic logging when supplying -Xlog:jit+compilation, e.g. "info" level. I believe we should lift at least some of the logging to "info" level for these. > > Additional testing: > - [x] Eyeballing `-Xlog:jit*` logs after the patch > - [ ] Linux x86_64 server fastdebug, `all` PrintInlining and PrintIntrinsics are diagnostic flags (while PrintCompilation is product). So mapping UL `Info` to product flag and `Debug` to diagnostic seems valid. Based on this, I agree with changes to `CT::print_ul()` but not others. ------------- PR Review: https://git.openjdk.org/jdk/pull/25061#pullrequestreview-2819277571 From shade at openjdk.org Tue May 6 19:18:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 6 May 2025 19:18:54 GMT Subject: RFR: 8356259: Lift basic -Xlog:jit* logging to "info" level [v2] In-Reply-To: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> References: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> Message-ID: > We have unified logging for JIT activity: -Xlog:jit+compilation, -Xlog:jit+inlining, etc. These serve as convenient replacements for -XX:+PrintCompilation, -XX:+PrintInlining, etc. And these replacements are useful, because UL can be forwarded to file, their format can be adjusted, and they can be handled asynchronously. > > However, all useful messages are on "debug" level, which is inconvenient and surprising. It is reasonable to expect some level of basic logging when supplying -Xlog:jit+compilation, e.g. "info" level. I believe we should lift at least some of the logging to "info" level for these. > > Additional testing: > - [x] Eyeballing `-Xlog:jit*` logs after the patch > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Only do jit+compilation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25061/files - new: https://git.openjdk.org/jdk/pull/25061/files/2e1b9e64..2b8c9576 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25061&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25061&range=00-01 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25061/head:pull/25061 PR: https://git.openjdk.org/jdk/pull/25061 From shade at openjdk.org Tue May 6 19:18:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 6 May 2025 19:18:55 GMT Subject: RFR: 8356259: Lift basic -Xlog:jit* logging to "info" level [v2] In-Reply-To: <6hO6sv_xTTfD8CuETfeCFvN0oURmjfX9PDIwzd4EnG4=.35d3674c-6e9e-444f-af5f-bc47586530b9@github.com> References: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> <6hO6sv_xTTfD8CuETfeCFvN0oURmjfX9PDIwzd4EnG4=.35d3674c-6e9e-444f-af5f-bc47586530b9@github.com> Message-ID: On Tue, 6 May 2025 18:39:43 GMT, Vladimir Kozlov wrote: > PrintInlining and PrintIntrinsics are diagnostic flags (while PrintCompilation is product). So mapping UL `Info` to product flag and `Debug` to diagnostic seems valid. Based on this, I agree with changes to `CT::print_ul()` but not others. I am mostly interested in `PrintCompilation` myself, so that would be an acceptable compromise. However, I do believe that `PrintInlining` along with `TraceTypeProfile` are very useful to figure out performance anomalies in the field. Those really should not be diagnostic, and UL should really be "info" for them :) But we can have that discussion at some point later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25061#issuecomment-2855647445 From duke at openjdk.org Tue May 6 21:45:34 2025 From: duke at openjdk.org (Mohamed Issa) Date: Tue, 6 May 2025 21:45:34 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v3] In-Reply-To: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. > > For performance data collected with the built in **cbrt** micro-benchmark, see the table below. Each result is the mean of 8 individual runs. Overall, the intrinsic provides a performance uplift of 37%. > > | Benchmark | Throughput with baseline (op/s) | Throughput with intrinsic (op/s) | Speedup | > | :----------------: | :----------------------------------: | :----------------------------------: | :---------: | > | MathBench.cbrt | 152465 | 208537 | 1.37x | > > Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: Add new set of cbrt micro-benchmarks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24470/files - new: https://git.openjdk.org/jdk/pull/24470/files/3212c669..57412f0d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24470&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24470&range=01-02 Stats: 148 lines in 1 file changed: 148 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24470/head:pull/24470 PR: https://git.openjdk.org/jdk/pull/24470 From kvn at openjdk.org Tue May 6 22:58:14 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 May 2025 22:58:14 GMT Subject: RFR: 8356259: Lift basic -Xlog:jit* logging to "info" level [v2] In-Reply-To: References: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> Message-ID: On Tue, 6 May 2025 19:18:54 GMT, Aleksey Shipilev wrote: >> We have unified logging for JIT activity: -Xlog:jit+compilation, -Xlog:jit+inlining, etc. These serve as convenient replacements for -XX:+PrintCompilation, -XX:+PrintInlining, etc. And these replacements are useful, because UL can be forwarded to file, their format can be adjusted, and they can be handled asynchronously. >> >> However, all useful messages are on "debug" level, which is inconvenient and surprising. It is reasonable to expect some level of basic logging when supplying -Xlog:jit+compilation, e.g. "info" level. I believe we should lift at least some of the logging to "info" level for these. >> >> Additional testing: >> - [x] Eyeballing `-Xlog:jit*` logs after the patch >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Only do jit+compilation Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25061#pullrequestreview-2819895653 From vlivanov at openjdk.org Tue May 6 23:21:18 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 6 May 2025 23:21:18 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v14] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 11:19:54 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > build fixes for non-x86 targets Very nice! I made a cleanup pass over the code [1]. Feel free to incorporate it or let me know if you have any questions/concerns. Meanwhile, submitted it for testing. [1] https://github.com/iwanowww/jdk/commit/35aeb88d0d5667c9e4f699bb9b3b7169af96446a ------------- PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2819173067 From vlivanov at openjdk.org Tue May 6 23:21:19 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 6 May 2025 23:21:19 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v14] In-Reply-To: References: <2ioSQVtfXhnqvAXqiadwR1HuJsz3t9nytY0wRps-x68=.35220ade-0e70-41c6-9ebd-a271e7dcb2bb@github.com> Message-ID: On Tue, 6 May 2025 08:45:15 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/vm_version_x86.hpp line 707: >> >>> 705: // >>> 706: static bool supports_cpuid() { return _features != 0; } >>> 707: static bool supports_cmov() { return (_features & CPU_CMOV) != 0; } >> >> Since you touch this code anyway, I suggest to use this opportunity to automatically derive this code using `CPU_FEATURE_FLAGS` macro. (As an example [1].) >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp#L147 > > Unlike AARCH64, there is not a 1:1 mapping b/w CPU_* features and the corresponding support checkers; some AVX512 checkers use multiple features. Skipping this for now for consistency. Sure, I'm fine with addressing it separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075993391 From shade at openjdk.org Wed May 7 07:07:18 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 May 2025 07:07:18 GMT Subject: RFR: 8356259: Lift basic -Xlog:jit* logging to "info" level [v2] In-Reply-To: References: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> Message-ID: On Tue, 6 May 2025 19:18:54 GMT, Aleksey Shipilev wrote: >> We have unified logging for JIT activity: -Xlog:jit+compilation, -Xlog:jit+inlining, etc. These serve as convenient replacements for -XX:+PrintCompilation, -XX:+PrintInlining, etc. And these replacements are useful, because UL can be forwarded to file, their format can be adjusted, and they can be handled asynchronously. >> >> However, all useful messages are on "debug" level, which is inconvenient and surprising. It is reasonable to expect some level of basic logging when supplying -Xlog:jit+compilation, e.g. "info" level. I believe we should lift at least some of the logging to "info" level for these. >> >> Additional testing: >> - [x] Eyeballing `-Xlog:jit*` logs after the patch >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Only do jit+compilation OK, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25061#issuecomment-2857360250 From shade at openjdk.org Wed May 7 07:47:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 May 2025 07:47:22 GMT Subject: Integrated: 8356259: Lift basic -Xlog:jit* logging to "info" level In-Reply-To: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> References: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> Message-ID: <761jqrKse3Lh7FxmHrUMnDPws8xEXOMB-o-Ry1HT6QI=.4c6bae97-8e59-4aff-aaa3-56dfac751eaa@github.com> On Tue, 6 May 2025 09:52:24 GMT, Aleksey Shipilev wrote: > We have unified logging for JIT activity: -Xlog:jit+compilation, -Xlog:jit+inlining, etc. These serve as convenient replacements for -XX:+PrintCompilation, -XX:+PrintInlining, etc. And these replacements are useful, because UL can be forwarded to file, their format can be adjusted, and they can be handled asynchronously. > > However, all useful messages are on "debug" level, which is inconvenient and surprising. It is reasonable to expect some level of basic logging when supplying -Xlog:jit+compilation, e.g. "info" level. I believe we should lift at least some of the logging to "info" level for these. > > Additional testing: > - [x] Eyeballing `-Xlog:jit*` logs after the patch > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: 50895835 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/50895835e0c78f54a0b33db7f42f3769e2a1e652 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8356259: Lift basic -Xlog:jit* logging to "info" level Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/25061 From aph at openjdk.org Wed May 7 09:28:19 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 7 May 2025 09:28:19 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v3] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Tue, 6 May 2025 21:45:34 GMT, Mohamed Issa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. >> >> For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. >> >> | Input range(s) | Throughput with baseline (op/s) | Throughput with intrinsic (op/s) | Speedup | >> | :-------------------------------------: | :----------------------------------: | :----------------------------------: | :---------: | >> | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | >> >> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Add new set of cbrt micro-benchmarks src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 62: > 60: { > 61: 0, 3220193280 > 62: }; What is this constant? Its value is 0xbff0400000000000, which is -ve bit set, bias (top bit of exponent) clear, but one of the bits in the fraction is set. So its value is -0x1.04p+0. As well as the exponent it also sets the 1 bit, just below the 5 most significant bits of the fraction. I guess this in effect rounds up the value that is added in the final rounding. Is that right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2077214995 From gbarany at openjdk.org Wed May 7 11:27:36 2025 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Wed, 7 May 2025 11:27:36 GMT Subject: RFR: 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java Message-ID: Remove special cases in `nmethod::is_deopt_entry` and `nmethod::is_deopt_mh_entry`. Graal used to generate a different code pattern from C2 for deopt handlers. This was changed in https://github.com/oracle/graal/commit/099f57b58edb23ed2184c11badea24edf36f30d2 to align Graal's code generation with C2. The special cases are no longer needed. ------------- Commit messages: - 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java Changes: https://git.openjdk.org/jdk/pull/25088/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25088&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354443 Stats: 11 lines in 1 file changed: 0 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25088.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25088/head:pull/25088 PR: https://git.openjdk.org/jdk/pull/25088 From jbhateja at openjdk.org Wed May 7 11:40:05 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 May 2025 11:40:05 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v15] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Making _features_bitmap size configurable - cleanups & refactorings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/650e3d61..cfc09d05 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=13-14 Stats: 192 lines in 9 files changed: 58 ins; 87 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From dnsimon at openjdk.org Wed May 7 13:25:15 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 7 May 2025 13:25:15 GMT Subject: RFR: 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:17:52 GMT, Gerg? Barany wrote: > Remove special cases in `nmethod::is_deopt_entry` and `nmethod::is_deopt_mh_entry`. Graal used to generate a different code pattern from C2 for deopt handlers. This was changed in https://github.com/oracle/graal/commit/099f57b58edb23ed2184c11badea24edf36f30d2 to align Graal's code generation with C2. The special cases are no longer needed. LGTM ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25088#pullrequestreview-2821736057 From yzheng at openjdk.org Wed May 7 13:36:16 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 7 May 2025 13:36:16 GMT Subject: RFR: 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:17:52 GMT, Gerg? Barany wrote: > Remove special cases in `nmethod::is_deopt_entry` and `nmethod::is_deopt_mh_entry`. Graal used to generate a different code pattern from C2 for deopt handlers. This was changed in https://github.com/oracle/graal/commit/099f57b58edb23ed2184c11badea24edf36f30d2 to align Graal's code generation with C2. The special cases are no longer needed. LGTM ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/25088#pullrequestreview-2821790193 From gbarany at openjdk.org Wed May 7 14:45:13 2025 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Wed, 7 May 2025 14:45:13 GMT Subject: RFR: 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:17:52 GMT, Gerg? Barany wrote: > Remove special cases in `nmethod::is_deopt_entry` and `nmethod::is_deopt_mh_entry`. Graal used to generate a different code pattern from C2 for deopt handlers. This was changed in https://github.com/oracle/graal/commit/099f57b58edb23ed2184c11badea24edf36f30d2 to align Graal's code generation with C2. The special cases are no longer needed. Thanks for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25088#issuecomment-2858866796 From duke at openjdk.org Wed May 7 14:45:13 2025 From: duke at openjdk.org (duke) Date: Wed, 7 May 2025 14:45:13 GMT Subject: RFR: 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:17:52 GMT, Gerg? Barany wrote: > Remove special cases in `nmethod::is_deopt_entry` and `nmethod::is_deopt_mh_entry`. Graal used to generate a different code pattern from C2 for deopt handlers. This was changed in https://github.com/oracle/graal/commit/099f57b58edb23ed2184c11badea24edf36f30d2 to align Graal's code generation with C2. The special cases are no longer needed. @gergo- Your change (at version 8028476c2e28e2c168676209260fa68194f74cf1) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25088#issuecomment-2858870106 From gbarany at openjdk.org Wed May 7 14:52:20 2025 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Wed, 7 May 2025 14:52:20 GMT Subject: Integrated: 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:17:52 GMT, Gerg? Barany wrote: > Remove special cases in `nmethod::is_deopt_entry` and `nmethod::is_deopt_mh_entry`. Graal used to generate a different code pattern from C2 for deopt handlers. This was changed in https://github.com/oracle/graal/commit/099f57b58edb23ed2184c11badea24edf36f30d2 to align Graal's code generation with C2. The special cases are no longer needed. This pull request has now been integrated. Changeset: 90f0f1b8 Author: Gerg? Barany Committer: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/90f0f1b88badbf1f72d7b9434621457aa47cde30 Stats: 11 lines in 1 file changed: 0 ins; 9 del; 2 mod 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java Reviewed-by: dnsimon, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/25088 From yzheng at openjdk.org Wed May 7 15:39:20 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 7 May 2025 15:39:20 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v15] In-Reply-To: References: Message-ID: <4qUlnS5IhZxUDg2w5C3aAo_saQ1IXSnbkmSNwpgzpes=.092d9c5d-836d-41d9-aa9b-e94c4520fea7@github.com> On Wed, 7 May 2025 11:40:05 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: > > - Making _features_bitmap size configurable > - cleanups & refactorings JVMCI changes look good. Will run some Graal tests on this PR src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotJVMCIBackendFactory.java line 121: > 119: long featureIndex = bitIndex >>> featuresElementShiftCount; > 120: long featureBitMask = 1L << (bitIndex & featuresElementMask); > 121: assert featureIndex < featuresBitMapSize; `featuresBitMapSize` is size in bytes while `featureIndex` is index to long array ------------- PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2822266780 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2077922290 From vlivanov at openjdk.org Wed May 7 21:53:58 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 7 May 2025 21:53:58 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v15] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:40:05 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: > > - Making _features_bitmap size configurable > - cleanups & refactorings There are some SA-related failures. Fixed by [1]. Otherwise, testing results are good. [1] https://github.com/iwanowww/jdk/commit/9100ef190befbb1967f477532a0776c135a9b728 src/hotspot/cpu/x86/vm_version_x86.hpp line 458: > 456: > 457: private: > 458: uint64_t _features_bitmap[(MAX_CPU_FEATURES >> 6) + 1]; Suggestion: uint64_t _features_bitmap[(MAX_CPU_FEATURES / BitsPerLong) + 1]; src/hotspot/cpu/x86/vm_version_x86.hpp line 460: > 458: uint64_t _features_bitmap[(MAX_CPU_FEATURES >> 6) + 1]; > 459: > 460: STATIC_ASSERT(sizeof(_features_bitmap) * BitsPerByte > MAX_CPU_FEATURES); Suggestion: STATIC_ASSERT(sizeof(_features_bitmap) * BitsPerByte >= MAX_CPU_FEATURES); ------------- PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2822970103 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2078346536 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2078354983 From vlivanov at openjdk.org Wed May 7 21:53:59 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 7 May 2025 21:53:59 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v15] In-Reply-To: <4qUlnS5IhZxUDg2w5C3aAo_saQ1IXSnbkmSNwpgzpes=.092d9c5d-836d-41d9-aa9b-e94c4520fea7@github.com> References: <4qUlnS5IhZxUDg2w5C3aAo_saQ1IXSnbkmSNwpgzpes=.092d9c5d-836d-41d9-aa9b-e94c4520fea7@github.com> Message-ID: On Wed, 7 May 2025 15:28:09 GMT, Yudi Zheng wrote: >> Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: >> >> - Making _features_bitmap size configurable >> - cleanups & refactorings > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotJVMCIBackendFactory.java line 121: > >> 119: long featureIndex = bitIndex >>> featuresElementShiftCount; >> 120: long featureBitMask = 1L << (bitIndex & featuresElementMask); >> 121: assert featureIndex < featuresBitMapSize; > > `featuresBitMapSize` is size in bytes while `featureIndex` is index to long array Good catch, Yudi. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2078544595 From jbhateja at openjdk.org Thu May 8 13:49:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 8 May 2025 13:49:22 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v16] In-Reply-To: References: Message-ID: <9Luwvte-huLN0cjCqBAdvitAE6ZwqPjmiLJSOEpFt04=.b9d7f325-0e85-44a9-ae18-2f770260c4f6@github.com> > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Reveiw suggestions incorporated ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/cfc09d05..8acbd7a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=14-15 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From jbhateja at openjdk.org Thu May 8 14:44:43 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 8 May 2025 14:44:43 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v17] In-Reply-To: References: Message-ID: <8tz0nbg5nt0WR_9Y_Zd_G2I26Dl8D4a5wBd0wBbrRQY=.2c71f9e8-8aa7-4a04-88df-d2ef018d73a8@github.com> > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Code re-factoring from Vladimir ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/8acbd7a6..1a3bce93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=15-16 Stats: 21 lines in 3 files changed: 7 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From yzheng at openjdk.org Thu May 8 14:49:42 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 8 May 2025 14:49:42 GMT Subject: RFR: 8353735: [JVMCI] Allow specifying storage kind of the callee save register [v2] In-Reply-To: References: Message-ID: <8jZWccxTMyrcHsQEiyaf6_TmGLBXIGdfW2bJWcVHMaU=.98eb7ab1-dc5c-4611-a2a9-4ca04d606836@github.com> > Windows x64 ABI considers the upper portions of YMM0-YMM15 and ZMM0-ZMM15 volatile, that is, destroyed on function calls. This PR allows `RegisterConfig` implementations to refine the storage kind of callee save register, such that JVMCI compiler can exploit this information to avoid saving full width of these registers. Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Update javadoc - Merge remote-tracking branch 'upstream/master' into JDK-8353735 - [JVMCI] Allow specifying storage kind of the callee save register ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24451/files - new: https://git.openjdk.org/jdk/pull/24451/files/339b72ef..fcdfd10d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24451&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24451&range=00-01 Stats: 315273 lines in 3080 files changed: 101272 ins; 201200 del; 12801 mod Patch: https://git.openjdk.org/jdk/pull/24451.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24451/head:pull/24451 PR: https://git.openjdk.org/jdk/pull/24451 From yzheng at openjdk.org Thu May 8 14:57:10 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 8 May 2025 14:57:10 GMT Subject: RFR: 8353735: [JVMCI] Allow specifying storage kind of the callee save register [v3] In-Reply-To: References: Message-ID: <_8_bdUwiZc5xZqStJm2XfneFUTdCEx4c_uDsKJcMkTc=.1df612b0-30c8-4ae3-8706-bd634dd9fbc4@github.com> > Windows x64 ABI considers the upper portions of YMM0-YMM15 and ZMM0-ZMM15 volatile, that is, destroyed on function calls. This PR allows `RegisterConfig` implementations to refine the storage kind of callee save register, such that JVMCI compiler can exploit this information to avoid saving full width of these registers. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: Update javadoc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24451/files - new: https://git.openjdk.org/jdk/pull/24451/files/fcdfd10d..bc900518 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24451&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24451&range=01-02 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24451.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24451/head:pull/24451 PR: https://git.openjdk.org/jdk/pull/24451 From dnsimon at openjdk.org Thu May 8 14:57:11 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 8 May 2025 14:57:11 GMT Subject: RFR: 8353735: [JVMCI] Allow specifying storage kind of the callee save register [v3] In-Reply-To: <_8_bdUwiZc5xZqStJm2XfneFUTdCEx4c_uDsKJcMkTc=.1df612b0-30c8-4ae3-8706-bd634dd9fbc4@github.com> References: <_8_bdUwiZc5xZqStJm2XfneFUTdCEx4c_uDsKJcMkTc=.1df612b0-30c8-4ae3-8706-bd634dd9fbc4@github.com> Message-ID: On Thu, 8 May 2025 14:54:36 GMT, Yudi Zheng wrote: >> Windows x64 ABI considers the upper portions of YMM0-YMM15 and ZMM0-ZMM15 volatile, that is, destroyed on function calls. This PR allows `RegisterConfig` implementations to refine the storage kind of callee save register, such that JVMCI compiler can exploit this information to avoid saving full width of these registers. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Update javadoc Still good. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24451#pullrequestreview-2825424244 From jbhateja at openjdk.org Thu May 8 19:21:31 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 8 May 2025 19:21:31 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v18] In-Reply-To: References: Message-ID: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Addressing Yudi's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/1a3bce93..c65f0777 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=16-17 Stats: 7 lines in 5 files changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From vlivanov at openjdk.org Thu May 8 19:23:59 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 8 May 2025 19:23:59 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v18] In-Reply-To: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> References: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> Message-ID: On Thu, 8 May 2025 19:21:31 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Addressing Yudi's comments Testing results (hs-tier1 - hs-tier4) are clean. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2826156052 From yzheng at openjdk.org Thu May 8 19:40:00 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 8 May 2025 19:40:00 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v18] In-Reply-To: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> References: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> Message-ID: On Thu, 8 May 2025 19:21:31 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Addressing Yudi's comments CPU features in Graal remain the same after this PR. Passed all Graal compiler unit tests. ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2826187636 From sviswanathan at openjdk.org Fri May 9 00:03:56 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 9 May 2025 00:03:56 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v9] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 07:28:04 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 464: >> >>> 462: __ movl(rcx, 0x18000000); // cpuid1 bits osxsave | avx >>> 463: __ andl(rcx, Address(rsi, 8)); // cpuid1 bits osxsave | avx >>> 464: __ jccb(Assembler::equal, done); // jump if AVX is not supported >> >> This doesn't not have same effect as before. Consider input is 0x10000000, the andl result will not be zero with this code and so jump to done will not happen. Whereas prior to this change, the cmpl with 0x18000000 will fail for equality and so a jump to done will happen. This is the case for all the places where we are checking more than 1 set bit. > > Thanks @sviswa7 , sub-optimality was mainly around single-bit comparisons, where we could save redundant CMP after AND, and by flipping the predicate of subsequent flag-consuming JMP, multibits compares should remain unaltered. This and all the following places with multi-bit check still need to be fixed. If you walk through stock and new code in this PR when Address(rsi, 8) on line 468 has 0x10000000, you will observe that stock code will jump to done and new code will not jump to done. Let me know if I am missing something. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2080592979 From sviswanathan at openjdk.org Fri May 9 00:03:58 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 9 May 2025 00:03:58 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v18] In-Reply-To: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> References: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> Message-ID: On Thu, 8 May 2025 19:21:31 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Addressing Yudi's comments test/hotspot/jtreg/serviceability/sa/ClhsdbLongConstant.java line 108: > 106: checkLongValue("VM_Version::CPU_SHA ", > 107: longConstantOutput, > 108: 34L); Need to change the comment on line 94 as well. test/lib-test/jdk/test/whitebox/CPUInfoTest.java line 69: > 67: "f16c", "pku", "ospke", "cet_ibt", > 68: "cet_ss", "avx512_ifma", "serialize", "avx_ifma", > 69: "apx_f", "avx10_1", "avx10_2" A minor nit, in between spacing could match previous statement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2080650055 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2080648091 From yzheng at openjdk.org Fri May 9 08:42:06 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 9 May 2025 08:42:06 GMT Subject: RFR: 8353735: [JVMCI] Allow specifying storage kind of the callee save register [v3] In-Reply-To: <_8_bdUwiZc5xZqStJm2XfneFUTdCEx4c_uDsKJcMkTc=.1df612b0-30c8-4ae3-8706-bd634dd9fbc4@github.com> References: <_8_bdUwiZc5xZqStJm2XfneFUTdCEx4c_uDsKJcMkTc=.1df612b0-30c8-4ae3-8706-bd634dd9fbc4@github.com> Message-ID: On Thu, 8 May 2025 14:57:10 GMT, Yudi Zheng wrote: >> Windows x64 ABI considers the upper portions of YMM0-YMM15 and ZMM0-ZMM15 volatile, that is, destroyed on function calls. This PR allows `RegisterConfig` implementations to refine the storage kind of callee save register, such that JVMCI compiler can exploit this information to avoid saving full width of these registers. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Update javadoc Tier1-3 passed. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24451#issuecomment-2865675256 From yzheng at openjdk.org Fri May 9 08:42:07 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 9 May 2025 08:42:07 GMT Subject: Integrated: 8353735: [JVMCI] Allow specifying storage kind of the callee save register In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 14:47:39 GMT, Yudi Zheng wrote: > Windows x64 ABI considers the upper portions of YMM0-YMM15 and ZMM0-ZMM15 volatile, that is, destroyed on function calls. This PR allows `RegisterConfig` implementations to refine the storage kind of callee save register, such that JVMCI compiler can exploit this information to avoid saving full width of these registers. This pull request has now been integrated. Changeset: 74e981e8 Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/74e981e85509ca072b2a45d529dab3a9883613a2 Stats: 11 lines in 1 file changed: 10 ins; 0 del; 1 mod 8353735: [JVMCI] Allow specifying storage kind of the callee save register Reviewed-by: dnsimon, cslucas ------------- PR: https://git.openjdk.org/jdk/pull/24451 From jbhateja at openjdk.org Fri May 9 15:17:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 15:17:17 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v19] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - Sandhya's review comments resoultion - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 - Addressing Yudi's comments - Code re-factoring from Vladimir - Reveiw suggestions incorporated - Making _features_bitmap size configurable - cleanups & refactorings - build fixes for non-x86 targets - Review comments resolutions - Updating comment - ... and 9 more: https://git.openjdk.org/jdk/compare/411a63ea...f583a521 ------------- Changes: https://git.openjdk.org/jdk/pull/24329/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=18 Stats: 520 lines in 15 files changed: 271 ins; 29 del; 220 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From mchevalier at openjdk.org Fri May 9 16:08:55 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 9 May 2025 16:08:55 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Tue, 6 May 2025 18:18:08 GMT, Vladimir Ivanov wrote: > making them expensive nodes (to avoid commoning during GVN) Good point! I still think I don't get everything. Let me try to sum up what I think I should do. For now, I don't want to mess with control, but I should prepare the field. Using general Call nodes for pure calls was pretty difficult: Call nodes have too much opinion, assumptions to easily work with for pure calls. But eventually, I want to change the nodes I'm using into a Call node, and more precisely a CallLeaf (I suspect once I'm done doing all I can do with pure calls, so in macro expansion, it's fine). To be able to do this transformation, I need to know control at this point. My goal is to start with control-less nodes, but find the late control during loop optimization, control-pin them at this point (because that's when the information is available) with both control input and output (needed for the expansion in CallLeaf), and continuing with control-pinned nodes. For now, I'm happy with the control I get from parsing. So, under my nodes, I need 2 outputs: control and data (everywhere now, and at least after control-pinning in the follow-up). I should then make ModFloating/ModD/ModF sub-classes of `MultNode` (I guess, I can make ModFloating a direct sub-class of `MultNode`. And I can introduce new node types for native math calls that would behave similarly wrt to elimination (and pinning in the future), and would also expand into `CallLeaf`. A weirdness of these nodes is that they would be CFG or not whether they are pinned already, and not depending on their type, but I'm not aware of a fundamental issue about that, as long as the change doesn't happen in the middle of a phase where it's relevant. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2867105355 From sviswanathan at openjdk.org Fri May 9 22:55:58 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 9 May 2025 22:55:58 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v19] In-Reply-To: References: Message-ID: <8y5JLR_7BMUJXmNNzPRusDpRWnJHtIPxZodVqQHrmmI=.ca53a482-8572-499f-af9f-6c255cf02896@github.com> On Fri, 9 May 2025 15:17:17 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Sandhya's review comments resoultion > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 > - Addressing Yudi's comments > - Code re-factoring from Vladimir > - Reveiw suggestions incorporated > - Making _features_bitmap size configurable > - cleanups & refactorings > - build fixes for non-x86 targets > - Review comments resolutions > - Updating comment > - ... and 9 more: https://git.openjdk.org/jdk/compare/411a63ea...f583a521 Rest of the PR looks good to me. src/hotspot/cpu/x86/vm_version_x86.cpp line 494: > 492: if (use_evex) { > 493: // check _cpuid_info.sef_cpuid7_ebx.bits.avx512f > 494: // OR check _cpuid_info.std_cpuid24_ebx.bits.avx10 This comment needs to be corrected: // OR check _cpuid_info.sefsl1_cpuid7_edx.bits.avx10 src/hotspot/cpu/x86/vm_version_x86.cpp line 1052: > 1050: if (is_intel()) { // Intel cpus specific settings > 1051: if (is_knights_family()) { > 1052: _features.clear_feature(CPU_VZEROUPPER); Should we be also clearing the CPU_AVX10_1 and CPU_AVX10_2 here? ------------- PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2829142420 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2082148591 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2082570611 From jbhateja at openjdk.org Fri May 9 23:36:16 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 23:36:16 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v20] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/f583a521..b4654fa4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=18-19 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From sviswanathan at openjdk.org Fri May 9 23:36:16 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 9 May 2025 23:36:16 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v20] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 23:33:42 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2829900271 From jbhateja at openjdk.org Fri May 9 23:36:16 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 23:36:16 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: <3t1R35B9bafRtfvqfE7D2dAeLrjaDukXlDUGb-3VtaA=.46d64318-e9fb-4bf3-8a68-8dba2c2b7b26@github.com> References: <3t1R35B9bafRtfvqfE7D2dAeLrjaDukXlDUGb-3VtaA=.46d64318-e9fb-4bf3-8a68-8dba2c2b7b26@github.com> Message-ID: On Sat, 3 May 2025 08:13:11 GMT, Vladimir Ivanov wrote: >>> Ok, thanks! I wasn't sure you finished the pass. >>> >>> I'm still seeing dynamic memory allocation which IMO unnecessarily complicates the implementation. Bitmap size is fixed and well-known at compile time. It enables `VM_Feature` class to embed the array of proper size inline. And it eliminates all the problems related to undesired sharing of backed array. (Also, `pre_initialize()` is not needed as well.) >> >> Bitmap size depends on the maximum feature enum value, I made it dynamic to keep it flexible. Do you want the feature vector size to be made constant and manually bump it when we exhaust the limit? > >> Bitmap size depends on the maximum feature enum value, I made it dynamic to keep it flexible. Do you want the feature vector size to be made constant and manually bump it when we exhaust the limit? > > Yes, please. (The limit may be precise - number of elements in Feature_Flag enum - but the logic which computes the size of backing array can automatically round it and bump the size once the actual limit is reached.) > >> pre_initialize was put in place because codeCache_init() proceeds VM_Version_init() > > I wanted to say that the sole purpose of `pre_initialize` is to allocate memory. Once it goes away, there's no reason to keep it. Thanks @iwanowww , @sviswa7 , @mur47x111 , @merykitty for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2868092244 From jbhateja at openjdk.org Fri May 9 23:36:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 23:36:17 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v19] In-Reply-To: <8y5JLR_7BMUJXmNNzPRusDpRWnJHtIPxZodVqQHrmmI=.ca53a482-8572-499f-af9f-6c255cf02896@github.com> References: <8y5JLR_7BMUJXmNNzPRusDpRWnJHtIPxZodVqQHrmmI=.ca53a482-8572-499f-af9f-6c255cf02896@github.com> Message-ID: On Fri, 9 May 2025 22:23:41 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Sandhya's review comments resoultion >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 >> - Addressing Yudi's comments >> - Code re-factoring from Vladimir >> - Reveiw suggestions incorporated >> - Making _features_bitmap size configurable >> - cleanups & refactorings >> - build fixes for non-x86 targets >> - Review comments resolutions >> - Updating comment >> - ... and 9 more: https://git.openjdk.org/jdk/compare/411a63ea...f583a521 > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1052: > >> 1050: if (is_intel()) { // Intel cpus specific settings >> 1051: if (is_knights_family()) { >> 1052: _features.clear_feature(CPU_VZEROUPPER); > > Should we be also clearing the CPU_AVX10_1 and CPU_AVX10_2 here? I agree; it may help validate KNL on Diamond Rapids :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2082628062 From jbhateja at openjdk.org Fri May 9 23:36:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 23:36:17 GMT Subject: Integrated: 8352675: Support Intel AVX10 converged vector ISA feature detection In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 13:57:22 GMT, Jatin Bhateja wrote: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html This pull request has now been integrated. Changeset: 3b336a9d Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/3b336a9da091c4df4373d2b845b60d2a7a4e3b1d Stats: 522 lines in 15 files changed: 273 ins; 29 del; 220 mod 8352675: Support Intel AVX10 converged vector ISA feature detection Reviewed-by: sviswanathan, vlivanov, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/24329 From vlivanov at openjdk.org Sat May 10 03:18:03 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 10 May 2025 03:18:03 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Fri, 9 May 2025 16:06:13 GMT, Marc Chevalier wrote: > My goal is to start with control-less nodes, but find the late control during loop optimization, control-pin them at this point (because that's when the information is available) with both control input and output (needed for the expansion in CallLeaf), and continuing with control-pinned nodes. If you combine lowering with pinning, you could replace a data node with a CFG node (CallLeaf in your case) at the point in CFG you choose. A single CFG node is enough to insert a CFG-only node, but you need to ensure the graph stays schedulable after the insertion. If you want to start with pinned node, the simplest way would be to make `CallPure` a subclass of `CallLeaf`, require it to be CFG-only (no memory in/out, no IO, etc) and populate only control in/out when inserting it into the graph during parsing. > For now, I'm happy with the control I get from parsing. Keep in mind that it assumes the node is pinned in CFG from the very beginning. Once the node starts in data-only mode, the control input it gained during parsing may end up too early for node's inputs to be scheduleable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2868277578 From qamai at openjdk.org Sat May 10 05:26:52 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 10 May 2025 05:26:52 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc I think a very simple approach you can take is having `CallPureNode` as a pure data node. It does not have to have anything to do with `CallNode` (no lowering into a `CallNode`, no subclass from `CallNode`) and it can have its mach implementation like this: instruct pureCall1F(xmm0 dst, xmm0 src) %{ match(Set dst (CallPure src)); effect(CALL); format %{ __ call(/*something*/); %} %} ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2868400653 From duke at openjdk.org Mon May 12 08:57:36 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 12 May 2025 08:57:36 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v3] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Eliminating some instructions from generate_kyber12To16_avx512() + fixing a comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24953/files - new: https://git.openjdk.org/jdk/pull/24953/files/c5c6449f..43455de2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24953&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24953&range=01-02 Stats: 75 lines in 1 file changed: 31 ins; 32 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24953/head:pull/24953 PR: https://git.openjdk.org/jdk/pull/24953 From duke at openjdk.org Mon May 12 09:05:10 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 12 May 2025 09:05:10 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v4] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Restoring copyright notice on ML_KEM.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24953/files - new: https://git.openjdk.org/jdk/pull/24953/files/43455de2..215b346f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24953&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24953&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24953/head:pull/24953 PR: https://git.openjdk.org/jdk/pull/24953 From shade at openjdk.org Mon May 12 14:07:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 14:07:02 GMT Subject: RFR: 8356783: CompilerTask hot_method is redundant Message-ID: This gave me some grief when implementing [JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269). From the initializations, it looks to me that `CompilerTask::hot_method()` is either `method()` or `nullptr`. In both cases, we do nothing special. So tracking `hot_method` is redundant, and can be purged. This improves performance a little, since it avoids extra handle-izing across compiler code, and of course it simplifies coding as well. Additional testing: - [x] Linux x86_64 server fastdebug, `compiler/` - [ ] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/25185/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25185&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356783 Stats: 62 lines in 8 files changed: 0 ins; 47 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/25185.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25185/head:pull/25185 PR: https://git.openjdk.org/jdk/pull/25185 From kvn at openjdk.org Mon May 12 17:04:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 17:04:51 GMT Subject: RFR: 8356783: CompilerTask hot_method is redundant In-Reply-To: References: Message-ID: On Mon, 12 May 2025 14:02:43 GMT, Aleksey Shipilev wrote: > This gave me some grief when implementing [JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269). From the initializations, it looks to me that `CompilerTask::hot_method()` is either `method()` or `nullptr`. In both cases, we do nothing special. So tracking `hot_method` is redundant, and can be purged. This improves performance a little, since it avoids extra handle-izing across compiler code, and of course it simplifies coding as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [ ] Linux x86_64 server fastdebug, `all` There was time when we compiled caller instead of method which triggers compilation (`StackWalkCompPolicy`). It was removed in JDK 13 [JDK-8216360](https://bugs.openjdk.org/browse/JDK-8216360) ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25185#pullrequestreview-2833911401 From cslucas at openjdk.org Mon May 12 19:27:53 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 12 May 2025 19:27:53 GMT Subject: RFR: 8356783: CompilerTask hot_method is redundant In-Reply-To: References: Message-ID: On Mon, 12 May 2025 14:02:43 GMT, Aleksey Shipilev wrote: > This gave me some grief when implementing [JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269). From the initializations, it looks to me that `CompilerTask::hot_method()` is either `method()` or `nullptr`. In both cases, we do nothing special. So tracking `hot_method` is redundant, and can be purged. This improves performance a little, since it avoids extra handle-izing across compiler code, and of course it simplifies coding as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [ ] Linux x86_64 server fastdebug, `all` LGTM ------------- Marked as reviewed by cslucas (Committer). PR Review: https://git.openjdk.org/jdk/pull/25185#pullrequestreview-2834258711 From dnsimon at openjdk.org Mon May 12 20:17:02 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 12 May 2025 20:17:02 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true Message-ID: By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal the initialization is sufficiently fast to not impact startup noticeably. The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). 2. Stop the VM before any application code can be executed. This is just good hygiene. This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. ------------- Commit messages: - only fail-fast for a missing JVMCI compiler on a HotSpot JIT thread - default EagerJVMCI to true if UseJVMCICompiler is true Changes: https://git.openjdk.org/jdk/pull/25121/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25121&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356447 Stats: 34 lines in 6 files changed: 31 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25121.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25121/head:pull/25121 PR: https://git.openjdk.org/jdk/pull/25121 From kvn at openjdk.org Mon May 12 20:39:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 20:39:51 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true In-Reply-To: References: Message-ID: On Thu, 8 May 2025 14:44:55 GMT, Doug Simon wrote: > By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal the initialization is sufficiently fast to not impact startup noticeably. > > The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: > 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). > 2. Stop the VM before any application code can be executed. This is just good hygiene. > > This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. > This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. src/hotspot/share/jvmci/jvmci_globals.cpp line 91: > 89: if (FLAG_IS_DEFAULT(EagerJVMCI) && !EagerJVMCI) { > 90: FLAG_SET_DEFAULT(EagerJVMCI, true); > 91: } The default value is `false` - I don't think you need check it. You can use `FLAG_SET_ERGO_IF_DEFAULT(EagerJVMCI, true);` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25121#discussion_r2085425314 From vlivanov at openjdk.org Mon May 12 21:04:58 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 12 May 2025 21:04:58 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Sat, 10 May 2025 05:24:02 GMT, Quan Anh Mai wrote: > I think a very simple approach you can take is having CallPureNode as a pure data node It's not as simple as it seems. In order to work reliably it requires full control of the code being called, so without extra work it is appropriate for generated stubs only. If you want to call some native code VM doesn't control, then either all caller-saved registers should be preserved across the call (which may be prohibitively expensive) or it should be made explicit there's a call taking place so all ABI effects are taken into account. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2874057369 From qamai at openjdk.org Tue May 13 03:14:55 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 May 2025 03:14:55 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: <7e0IhYYv_1dDlLgmUM8rKj5bjDx3lIhY2PRt-fC-rTs=.35437a80-80c7-4332-9339-a6f047b73289@github.com> On Mon, 12 May 2025 21:01:34 GMT, Vladimir Ivanov wrote: >> I think a very simple approach you can take is having `CallPureNode` as a pure data node. It does not have to have anything to do with `CallNode` (no lowering into a `CallNode`, no subclass from `CallNode`) and it can have its mach implementation like this: >> >> instruct pureCall1F(xmm0 dst, xmm0 src) %{ >> match(Set dst (CallPure src)); >> effect(CALL); >> format %{ >> __ call(/*something*/); >> %} >> %} > >> I think a very simple approach you can take is having CallPureNode as a pure data node > > It's not as simple as it seems. In order to work reliably it requires full control of the code being called, so without extra work it is appropriate for generated stubs only. If you want to call some native code VM doesn't control, then either all caller-saved registers should be preserved across the call (which may be prohibitively expensive) or it should be made explicit there's a call taking place so all ABI effects are taken into account. @iwanowww I believe `effect(CALL)` marks that a call is taking place and the register allocator will know how to save the registers accordingly. Note that on arm, long division is implemented as a call: https://github.com/openjdk/jdk/blob/adebfa7ffda6383f5793278ced14a193066c5f6a/src/hotspot/cpu/arm/arm.ad#L5962 And `SharedRuntime::ldiv` is implemented in C++: https://github.com/openjdk/jdk/blob/adebfa7ffda6383f5793278ced14a193066c5f6a/src/hotspot/share/runtime/sharedRuntime.cpp#L272 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2874936879 From dnsimon at openjdk.org Tue May 13 06:52:27 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 13 May 2025 06:52:27 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true [v2] In-Reply-To: References: Message-ID: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> > By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal the initialization is sufficiently fast to not impact startup noticeably. > > The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: > 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). > 2. Stop the VM before any application code can be executed. This is just good hygiene. > > This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. > This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: use FLAG_SET_ERGO_IF_DEFAULT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25121/files - new: https://git.openjdk.org/jdk/pull/25121/files/42c351b5..ad4be5dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25121&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25121&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25121.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25121/head:pull/25121 PR: https://git.openjdk.org/jdk/pull/25121 From shade at openjdk.org Tue May 13 08:33:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 May 2025 08:33:52 GMT Subject: RFR: 8356783: CompilerTask hot_method is redundant In-Reply-To: References: Message-ID: On Mon, 12 May 2025 14:02:43 GMT, Aleksey Shipilev wrote: > This gave me some grief when implementing [JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269). From the initializations, it looks to me that `CompilerTask::hot_method()` is either `method()` or `nullptr`. In both cases, we do nothing special. So tracking `hot_method` is redundant, and can be purged. This improves performance a little, since it avoids extra handle-izing across compiler code, and of course it simplifies coding as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` Thanks! Testing is green here. I'll wait a bit more if anyone else wants to review, and then I'll integrate to continue with [JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25185#issuecomment-2875537533 From yzheng at openjdk.org Tue May 13 12:39:55 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 13 May 2025 12:39:55 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true [v2] In-Reply-To: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> References: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> Message-ID: On Tue, 13 May 2025 06:52:27 GMT, Doug Simon wrote: >> By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal the initialization is sufficiently fast to not impact startup noticeably. >> >> The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: >> 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). >> 2. Stop the VM before any application code can be executed. This is just good hygiene. >> >> This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. >> This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > use FLAG_SET_ERGO_IF_DEFAULT LGTM ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/25121#pullrequestreview-2836595615 From shade at openjdk.org Tue May 13 13:20:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 May 2025 13:20:07 GMT Subject: RFR: 8356783: CompilerTask hot_method is redundant In-Reply-To: References: Message-ID: On Mon, 12 May 2025 14:02:43 GMT, Aleksey Shipilev wrote: > This gave me some grief when implementing [JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269). From the initializations, it looks to me that `CompilerTask::hot_method()` is either `method()` or `nullptr`. In both cases, we do nothing special. So tracking `hot_method` is redundant, and can be purged. This improves performance a little, since it avoids extra handle-izing across compiler code, and of course it simplifies coding as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25185#issuecomment-2876483328 From shade at openjdk.org Tue May 13 13:20:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 May 2025 13:20:07 GMT Subject: Integrated: 8356783: CompilerTask hot_method is redundant In-Reply-To: References: Message-ID: <7GteLkEIZDn7y_TejXwlsYTUiVcwJNkQ8ul61fQgZaM=.10db59b8-485c-43b2-8846-eca08355e70a@github.com> On Mon, 12 May 2025 14:02:43 GMT, Aleksey Shipilev wrote: > This gave me some grief when implementing [JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269). From the initializations, it looks to me that `CompilerTask::hot_method()` is either `method()` or `nullptr`. In both cases, we do nothing special. So tracking `hot_method` is redundant, and can be purged. This improves performance a little, since it avoids extra handle-izing across compiler code, and of course it simplifies coding as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: 48d2acb3 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/48d2acb3860f742eb1c06b89f8a7208d0d7a01e7 Stats: 62 lines in 8 files changed: 0 ins; 47 del; 15 mod 8356783: CompilerTask hot_method is redundant Reviewed-by: kvn, cslucas ------------- PR: https://git.openjdk.org/jdk/pull/25185 From kvn at openjdk.org Tue May 13 15:34:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 13 May 2025 15:34:52 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true [v2] In-Reply-To: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> References: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> Message-ID: On Tue, 13 May 2025 06:52:27 GMT, Doug Simon wrote: >> By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal, the initialization is sufficiently fast to not impact startup. >> >> The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: >> 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). >> 2. Stop the VM before any application code can be executed. This is just good hygiene. >> >> This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. >> This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > use FLAG_SET_ERGO_IF_DEFAULT Marked as reviewed by kvn (Reviewer). @dougxc please remind me. Is it true that with current libgraal no Java code is executed when it is initialized? Or you still have calls into core library? ------------- PR Review: https://git.openjdk.org/jdk/pull/25121#pullrequestreview-2837264646 PR Comment: https://git.openjdk.org/jdk/pull/25121#issuecomment-2876984838 From never at openjdk.org Tue May 13 15:39:51 2025 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 13 May 2025 15:39:51 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true [v2] In-Reply-To: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> References: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> Message-ID: On Tue, 13 May 2025 06:52:27 GMT, Doug Simon wrote: >> By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal, the initialization is sufficiently fast to not impact startup. >> >> The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: >> 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). >> 2. Stop the VM before any application code can be executed. This is just good hygiene. >> >> This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. >> This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > use FLAG_SET_ERGO_IF_DEFAULT Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25121#pullrequestreview-2837280275 From dnsimon at openjdk.org Tue May 13 15:53:53 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 13 May 2025 15:53:53 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true [v2] In-Reply-To: References: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> Message-ID: <5ryTduYlJ4b6MFzxFmjZaXl8Y7LhX5fG2TIPWXKs2dk=.c6840419-1b67-47b5-953c-437e36cf1cc0@github.com> On Tue, 13 May 2025 15:30:03 GMT, Vladimir Kozlov wrote: > @dougxc please remind me. Is it true that with current libgraal no Java code is executed when it is initialized? Or you still have calls into core library? There are still some calls to `CompilerToVM.lookupType` during libgraal initialization but I think all the types it looks up will already be resolved so will not require Java code execution. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25121#issuecomment-2877051478 From dnsimon at openjdk.org Tue May 13 16:02:00 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 13 May 2025 16:02:00 GMT Subject: Integrated: 8356447: Change default for EagerJVMCI to true In-Reply-To: References: Message-ID: On Thu, 8 May 2025 14:44:55 GMT, Doug Simon wrote: > By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal, the initialization is sufficiently fast to not impact startup. > > The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: > 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). > 2. Stop the VM before any application code can be executed. This is just good hygiene. > > This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. > This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. This pull request has now been integrated. Changeset: 08b2df80 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/08b2df80c68e182fbf6b1fc94e991c02b23040ec Stats: 32 lines in 6 files changed: 29 ins; 0 del; 3 mod 8356447: Change default for EagerJVMCI to true Reviewed-by: yzheng, kvn, never ------------- PR: https://git.openjdk.org/jdk/pull/25121 From dnsimon at openjdk.org Tue May 13 16:01:59 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 13 May 2025 16:01:59 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true [v2] In-Reply-To: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> References: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> Message-ID: On Tue, 13 May 2025 06:52:27 GMT, Doug Simon wrote: >> By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal, the initialization is sufficiently fast to not impact startup. >> >> The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: >> 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). >> 2. Stop the VM before any application code can be executed. This is just good hygiene. >> >> This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. >> This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > use FLAG_SET_ERGO_IF_DEFAULT Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25121#issuecomment-2877108155 From duke at openjdk.org Tue May 13 22:38:53 2025 From: duke at openjdk.org (Mohamed Issa) Date: Tue, 13 May 2025 22:38:53 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v3] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Wed, 7 May 2025 09:25:30 GMT, Andrew Haley wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Add new set of cbrt micro-benchmarks > > src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 62: > >> 60: { >> 61: 0, 3220193280 >> 62: }; > > What is this constant? > > Its value is 0xbff0400000000000, which is -ve bit set, bias (top bit of exponent) clear, but one of the bits in the fraction is set. So its value is -0x1.04p+0. As well as the exponent it also sets the 1 bit, just below the 5 most significant bits of the fraction. I guess this in effect rounds up the value that is added in the final rounding. > > Is that right? The idea is mainly that the _EXP_MSK2_ constant operates on the input to match up with it's corresponding entries in the lookup tables: _rcp_table_, _cbrt_table_, and _D_table_. The key part starts with computing the difference (_r = x - x'_) shown in line 260 below. ```c++ __ subsd(xmm1, xmm3); Here _x_ is essentially the input fraction with all bits while _x'_ is the input fraction with _EXP_MSK2_ applied. This is then multiplied (_r = (x - x') * rcp_table(x')_) with the corresponding lookup table entry (_-1 / 1.b1 b2 b3 b4 b5 b6_ where _b6=1_) as shown in line 264 below. ```c++ __ mulsd(xmm1, xmm4); This value then gets used by subsequent steps that involve entries from _cbrt_table_ and _D_table_. It won't necessarily round the final result up though as those effects will depend on what the input is. However, the polynomial coefficients will have a bigger impact on rounding. For a summary of the approximations, please refer to the algorithm description comment block near the beginning of the source file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2087726049 From sviswanathan at openjdk.org Wed May 14 00:41:56 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 14 May 2025 00:41:56 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v4] In-Reply-To: References: Message-ID: <-L1FHPpbVOvHTxMFUPMGIY9g8UFAFmJDgNRkoFONKnI=.ddef5354-e00f-4c2a-80c3-b48325fe51d2@github.com> On Mon, 12 May 2025 09:05:10 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Restoring copyright notice on ML_KEM.java Only reviewed three intrinsics so far, more review to do. src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 693: > 691: // a (short[256]) = c_rarg1 > 692: // b (short[256]) = c_rarg2 > 693: // kyberConsts (short[40]) = c_rarg3 kyberConsts is not one of the arguments passed in. src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 696: > 694: address generate_kyberAddPoly_2_avx512(StubGenerator *stubgen, > 695: MacroAssembler *_masm) { > 696: The Java code for "implKyberAddPoly(short[] result, short[] a, short[] b)" does BarrettReduction but the intrinsic code here does not. Is that intentional and how is the reduction handled? src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 742: > 740: // b (short[256]) = c_rarg2 > 741: // c (short[256]) = c_rarg3 > 742: // kyberConsts (short[40]) = c_rarg4 kyberConsts is not one of the arguments passed in. src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 799: > 797: // parsedLength (int) = c_rarg3 > 798: address generate_kyber12To16_avx512(StubGenerator *stubgen, > 799: MacroAssembler *_masm) { If AVX512_VBMI and AVX512_VBMI2 is available, it looks to me that the loop body of this algorithm can be implemented using more efficient instructions in simple 5 steps: Step 1: Load 0-47, 48-95, 96-143, 144-191 condensed bytes into xmm0, xmm1, xmm2, xmm3 respectively using masked load. Step 2: Use vpermb to arrange xmm0 such that bytes 1, 4, 7, ... are duplicated xmm0 before b47, b46, ..., b0 where each b is a byte xmm0 after b47 b46 b46 b45, ......., b5 b4 b4 b3 b2 b1 b1 b0 Repeat this for xmm1, xmm2, xmm3 Step 3: Use vpshldvw to shift every word (16 bits) in the xmm0 appropriately with variable shift Shift word 31 by 4, word 30 by 0, ... word 3 by 4, word 2 by 0, word 1 by 4, word 0 by 0 Repeat this for xmm1, xmm2, xmm3 Step 4: Use vpand to "and" each word element in xmm0 by 0xfff. Repeat this for xmm1, xmm2, xmm3 Step 5: Store xmm0 into parsed Store xmm1 into parsed + 64 Store xmm2 into parsed +128 Store xmm3 into parsed + 192 If you think there is not sufficient time, we could look into it after the merge of this PR as well. ------------- PR Review: https://git.openjdk.org/jdk/pull/24953#pullrequestreview-2837616051 PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2087361991 PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2087377640 PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2087331798 PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2087834072 From duke at openjdk.org Wed May 14 11:43:58 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 14 May 2025 11:43:58 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v4] In-Reply-To: <-L1FHPpbVOvHTxMFUPMGIY9g8UFAFmJDgNRkoFONKnI=.ddef5354-e00f-4c2a-80c3-b48325fe51d2@github.com> References: <-L1FHPpbVOvHTxMFUPMGIY9g8UFAFmJDgNRkoFONKnI=.ddef5354-e00f-4c2a-80c3-b48325fe51d2@github.com> Message-ID: On Tue, 13 May 2025 17:53:50 GMT, Sandhya Viswanathan wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Restoring copyright notice on ML_KEM.java > > src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 693: > >> 691: // a (short[256]) = c_rarg1 >> 692: // b (short[256]) = c_rarg2 >> 693: // kyberConsts (short[40]) = c_rarg3 > > kyberConsts is not one of the arguments passed in. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 696: > >> 694: address generate_kyberAddPoly_2_avx512(StubGenerator *stubgen, >> 695: MacroAssembler *_masm) { >> 696: > > The Java code for "implKyberAddPoly(short[] result, short[] a, short[] b)" does BarrettReduction but the intrinsic code here does not. Is that intentional and how is the reduction handled? Actually, the Java version is the one that is too cautious. There is Barrett reduction after at most 4 consecutive uses of mlKemAddPoly(), so doing the reduction in implKyberAddPoly() is not necessary. Thanks for discovering this! > src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 742: > >> 740: // b (short[256]) = c_rarg2 >> 741: // c (short[256]) = c_rarg3 >> 742: // kyberConsts (short[40]) = c_rarg4 > > kyberConsts is not one of the arguments passed in. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 799: > >> 797: // parsedLength (int) = c_rarg3 >> 798: address generate_kyber12To16_avx512(StubGenerator *stubgen, >> 799: MacroAssembler *_masm) { > > If AVX512_VBMI and AVX512_VBMI2 is available, it looks to me that the loop body of this algorithm can be implemented using more efficient instructions in simple 5 steps: > > Step 1: > Load 0-47, 48-95, 96-143, 144-191 condensed bytes into xmm0, xmm1, xmm2, xmm3 respectively using masked load. > > Step 2: > Use vpermb to arrange xmm0 such that bytes 1, 4, 7, ... are duplicated > xmm0 before b47, b46, ..., b0 where each b is a byte > xmm0 after b47 b46 b46 b45, ......., b5 b4 b4 b3 b2 b1 b1 b0 > Repeat this for xmm1, xmm2, xmm3 > > Step 3: > Use vpshldvw to shift every word (16 bits) in the xmm0 appropriately with variable shift > Shift word 31 by 4, word 30 by 0, ... word 3 by 4, word 2 by 0, word 1 by 4, word 0 by 0 > Repeat this for xmm1, xmm2, xmm3 > > Step 4: > Use vpand to "and" each word element in xmm0 by 0xfff. > Repeat this for xmm1, xmm2, xmm3 > > Step 5: > Store xmm0 into parsed > Store xmm1 into parsed + 64 > Store xmm2 into parsed +128 > Store xmm3 into parsed + 192 > > If you think there is not sufficient time, we could look into it after the merge of this PR as well. Yes, that way we can speed this up a little (well, in itself it might be significant), but with the current intrinsics, the contribution of this function to the overall running time is about 1.5%, so it would not matter that much, while on the other hand not all AVX-512 capable processors have vbmi. So I would rather not do it in this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2088738946 PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2088738841 PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2088738704 PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2088738615 From duke at openjdk.org Wed May 14 11:49:11 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 14 May 2025 11:49:11 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v5] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Responding to comments by Sandhya. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24953/files - new: https://git.openjdk.org/jdk/pull/24953/files/215b346f..32571f39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24953&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24953&range=03-04 Stats: 4 lines in 2 files changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24953/head:pull/24953 PR: https://git.openjdk.org/jdk/pull/24953 From yzheng at openjdk.org Wed May 14 13:24:02 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 14 May 2025 13:24:02 GMT Subject: RFR: 8356971: [JVMCI] Export VM_Version::supports_avx512_simd_sort to JVMCI compiler Message-ID: HotSpot selects between AVX512 and AVX2 implementations of array sort/partition stubs based on the return value of VM_Version::supports_avx512_simd_sort. The AVX2 version supports fewer element types than the AVX512 version and may fail at runtime if unsupported types are encountered. This capability information should be exposed to the JVMCI compiler to properly guard against incorrect intrinsification. This is especially important because VM_Version::supports_avx512_simd_sort includes a special exclusion rule for AMD Zen4, due to performance considerations. ------------- Commit messages: - 8356971: [JVMCI] Export VM_Version::supports_avx512_simd_sort to JVMCI compiler. Changes: https://git.openjdk.org/jdk/pull/25225/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25225&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356971 Stats: 4 lines in 3 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25225/head:pull/25225 PR: https://git.openjdk.org/jdk/pull/25225 From dnsimon at openjdk.org Wed May 14 13:42:52 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 14 May 2025 13:42:52 GMT Subject: RFR: 8356971: [JVMCI] Export VM_Version::supports_avx512_simd_sort to JVMCI compiler In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:16:26 GMT, Yudi Zheng wrote: > HotSpot selects between AVX512 and AVX2 implementations of array sort/partition stubs based on the return value of VM_Version::supports_avx512_simd_sort. The AVX2 version supports fewer element types than the AVX512 version and may fail at runtime if unsupported types are encountered. This capability information should be exposed to the JVMCI compiler to properly guard against incorrect intrinsification. This is especially important because VM_Version::supports_avx512_simd_sort includes a special exclusion rule for AMD Zen4, due to performance considerations. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25225#pullrequestreview-2840252727 From sviswanathan at openjdk.org Wed May 14 16:03:52 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 14 May 2025 16:03:52 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v4] In-Reply-To: References: <-L1FHPpbVOvHTxMFUPMGIY9g8UFAFmJDgNRkoFONKnI=.ddef5354-e00f-4c2a-80c3-b48325fe51d2@github.com> Message-ID: On Wed, 14 May 2025 11:41:30 GMT, Ferenc Rakoczi wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 696: >> >>> 694: address generate_kyberAddPoly_2_avx512(StubGenerator *stubgen, >>> 695: MacroAssembler *_masm) { >>> 696: >> >> The Java code for "implKyberAddPoly(short[] result, short[] a, short[] b)" does BarrettReduction but the intrinsic code here does not. Is that intentional and how is the reduction handled? > > Actually, the Java version is the one that is too cautious. There is Barrett reduction after at most 4 consecutive uses of mlKemAddPoly(), so doing the reduction in implKyberAddPoly() is not necessary. Thanks for discovering this! Thanks. I have another question, is there a reason that the Java versions of AddPoly (both for 2 and 3 input) return 1, whereas the corresponding intrinsics return 0? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2089278218 From duke at openjdk.org Wed May 14 16:30:54 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 14 May 2025 16:30:54 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v4] In-Reply-To: References: <-L1FHPpbVOvHTxMFUPMGIY9g8UFAFmJDgNRkoFONKnI=.ddef5354-e00f-4c2a-80c3-b48325fe51d2@github.com> Message-ID: On Wed, 14 May 2025 16:00:55 GMT, Sandhya Viswanathan wrote: >> Actually, the Java version is the one that is too cautious. There is Barrett reduction after at most 4 consecutive uses of mlKemAddPoly(), so doing the reduction in implKyberAddPoly() is not necessary. Thanks for discovering this! > > Thanks. I have another question, is there a reason that the Java versions of AddPoly (both for 2 and 3 input) return 1, whereas the corresponding intrinsics return 0? I use that for debugging. E.g. it is fairly easy to change the Java code to call both the intrinsic and Java version and compare the results. I don't see any harm in leaving that in the production version, since it is always ignored. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2089322079 From yzheng at openjdk.org Wed May 14 19:50:55 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 14 May 2025 19:50:55 GMT Subject: RFR: 8356971: [JVMCI] Export VM_Version::supports_avx512_simd_sort to JVMCI compiler In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:16:26 GMT, Yudi Zheng wrote: > HotSpot selects between AVX512 and AVX2 implementations of array sort/partition stubs based on the return value of VM_Version::supports_avx512_simd_sort. The AVX2 version supports fewer element types than the AVX512 version and may fail at runtime if unsupported types are encountered. This capability information should be exposed to the JVMCI compiler to properly guard against incorrect intrinsification. This is especially important because VM_Version::supports_avx512_simd_sort includes a special exclusion rule for AMD Zen4, due to performance considerations. Thanks for the review! Passed tier1-3. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25225#issuecomment-2881367102 From yzheng at openjdk.org Wed May 14 19:50:55 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 14 May 2025 19:50:55 GMT Subject: Integrated: 8356971: [JVMCI] Export VM_Version::supports_avx512_simd_sort to JVMCI compiler In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:16:26 GMT, Yudi Zheng wrote: > HotSpot selects between AVX512 and AVX2 implementations of array sort/partition stubs based on the return value of VM_Version::supports_avx512_simd_sort. The AVX2 version supports fewer element types than the AVX512 version and may fail at runtime if unsupported types are encountered. This capability information should be exposed to the JVMCI compiler to properly guard against incorrect intrinsification. This is especially important because VM_Version::supports_avx512_simd_sort includes a special exclusion rule for AMD Zen4, due to performance considerations. This pull request has now been integrated. Changeset: 948ade8e Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/948ade8e7003a41683600428c8e3155c7ed798db Stats: 4 lines in 3 files changed: 4 ins; 0 del; 0 mod 8356971: [JVMCI] Export VM_Version::supports_avx512_simd_sort to JVMCI compiler Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/25225 From sviswanathan at openjdk.org Thu May 15 00:38:54 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 15 May 2025 00:38:54 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v5] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 11:49:11 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Responding to comments by Sandhya. Another minor comment. Rest of the PR looks good to me. src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 893: > 891: // > 892: // coeffs (short[256]) = c_rarg0 > 893: // kyberConsts (short[40]) = c_rarg1 kyberConsts is not an input parameter to implKyberBarrettReduce. ------------- PR Review: https://git.openjdk.org/jdk/pull/24953#pullrequestreview-2840763895 PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2089284332 From tschatzl at openjdk.org Thu May 15 08:18:47 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 15 May 2025 08:18:47 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v38] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 54 commits: - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review: remove sweep_epoch - Merge branch 'master' into card-table-as-dcq-merge - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review (part 2 - yield duration changes) - * ayang review (part 1) - * indentation fix - * remove support for 32 bit x86 in the barrier generation code, following latest changes from @shade - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * fixes after merge related to 32 bit x86 removal - ... and 44 more: https://git.openjdk.org/jdk/compare/5e50a584...1def83af ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=37 Stats: 7088 lines in 111 files changed: 2568 ins; 3599 del; 921 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From duke at openjdk.org Thu May 15 13:33:42 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 15 May 2025 13:33:42 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v6] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Response to review comment + loading constants with broadcast op. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24953/files - new: https://git.openjdk.org/jdk/pull/24953/files/32571f39..e4f3264e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24953&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24953&range=04-05 Stats: 107 lines in 1 file changed: 39 ins; 39 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/24953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24953/head:pull/24953 PR: https://git.openjdk.org/jdk/pull/24953 From duke at openjdk.org Thu May 15 13:48:56 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 15 May 2025 13:48:56 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v5] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 16:04:31 GMT, Sandhya Viswanathan wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Responding to comments by Sandhya. > > src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 893: > >> 891: // >> 892: // coeffs (short[256]) = c_rarg0 >> 893: // kyberConsts (short[40]) = c_rarg1 > > kyberConsts is not an input parameter to implKyberBarrettReduce. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2091216578 From duke at openjdk.org Thu May 15 14:06:53 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 15 May 2025 14:06:53 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v5] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 00:36:26 GMT, Sandhya Viswanathan wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Responding to comments by Sandhya. > > Another minor comment. Rest of the PR looks good to me. @sviswa7, thanks a lot for the review! If you agree with my changes to load the constants using broadcasting instructions instead of full AVX register loads, would you be so kind as to approve the PR and sponsor my integration? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24953#issuecomment-2883937966 From dnsimon at openjdk.org Thu May 15 21:54:17 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 15 May 2025 21:54:17 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used Message-ID: The `EnableJVMCI` flag currently serves 2 purposes: * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. This PR changes nothing about the first point. On the second point, to use the `jdk.internal.vm.ci` module, it must now be explicitly added with `--add-modules=jdk.internal.vm.ci`, which will also set `EnableJVMCI` as a side-effect. The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on an archive of the root module set created in a separate JVM execution. If the root module set is different than what's in the archive at runtime, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved when creating the archive, it must not be resolved in the runtime using the archive. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci` for libgraal to have the startup advantages of AOTClassLinking. ------------- Commit messages: - added comment in check_vm_args_consistency - --add-modules=jdk.internal.vm.ci implies -XX:+EnableJVMCI Changes: https://git.openjdk.org/jdk/pull/25240/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25240&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345826 Stats: 63 lines in 10 files changed: 45 ins; 5 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/25240.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25240/head:pull/25240 PR: https://git.openjdk.org/jdk/pull/25240 From never at openjdk.org Thu May 15 21:54:18 2025 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 15 May 2025 21:54:18 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used In-Reply-To: References: Message-ID: On Wed, 14 May 2025 22:00:30 GMT, Doug Simon wrote: > The `EnableJVMCI` flag currently serves 2 purposes: > * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). > * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. > > This PR changes nothing about the first point. > > On the second point, to use the `jdk.internal.vm.ci` module, it must now be explicitly added with `--add-modules=jdk.internal.vm.ci`, which will also set `EnableJVMCI` as a side-effect. > > The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on an archive of the root module set created in a separate JVM execution. If the root module set is different than what's in the archive at runtime, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved when creating the archive, it must not be resolved in the runtime using the archive. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci` for libgraal to have the startup advantages of AOTClassLinking. I found your explanation quite confusing, but the bug title is actually the most clear description of the fix. Basically libjvmci doesn't require the existence of jdk.internal.vm.ci on the HotSpot side since it has effectively compiled that into itself. So we are decoupling the ability to use JVMCI from the presence of the JVMCI module. A short comment along these lines in at least your changes in check_vm_args_consistency would be helpful I think. I do find it confusing that we are explicitly passing `--add-modules=jdk.internal.vm.ci` in a bunch of the tests. Is that now necessary or are you just exercising the alternate ways of enabling JVMCI? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25240#issuecomment-2884370108 From dnsimon at openjdk.org Thu May 15 21:54:18 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 15 May 2025 21:54:18 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used In-Reply-To: References: Message-ID: On Thu, 15 May 2025 16:10:12 GMT, Tom Rodriguez wrote: > I found your explanation quite confusing, but the bug title is actually the most clear description of the fix. Basically libjvmci doesn't require the existence of jdk.internal.vm.ci on the HotSpot side since it has effectively compiled that into itself. So we are decoupling the ability to use JVMCI from the presence of the JVMCI module. A short comment along these lines in at least your changes in check_vm_args_consistency would be helpful I think. I added the requested comment and tried to clarify the PR description. Let me know if clarification is needed. > I do find it confusing that we are explicitly passing `--add-modules=jdk.internal.vm.ci` in a bunch of the tests. Is that now necessary or are you just exercising the alternate ways of enabling JVMCI? Without that option, the module will be missing and without the fail-fast check in `check_vm_args_consistency` you would get an error such as: Uncaught exception at src/hotspot/share/jvmci/jvmciRuntime.cpp:1433 java.lang.NoClassDefFoundError: jdk/vm/ci/code/Architecture # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (jvmciRuntime.cpp:1636), pid=1979, tid=9731 # fatal error: Fatal JVMCI exception (see JVMCI Events for stack trace): Uncaught exception at src/hotspot/share/jvmci/jvmciRuntime.cpp:1433 # ------------- PR Comment: https://git.openjdk.org/jdk/pull/25240#issuecomment-2885119528 From vlivanov at openjdk.org Thu May 15 21:58:54 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 15 May 2025 21:58:54 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: <4vbXpgvmXv6Ba1fEkMKIRpUnXZ-QVdAZ7rgicqxVhpM=.7dda802c-9b8a-459d-9bd7-7a83d9fc1744@github.com> On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc Interesting! I wasn't aware ADLC already features such support. Thanks for the pointers. It does look attractive, especially for platform-specific use cases. But there are some pitfalls which makes it hard to use on its own. In particular, data nodes are aggressively commoned and freely flow in the graph. Unless it is taken into account during GVN and code motion, the final schedule may end up far from optimal. (In other words, it's highly beneficial to match only expensive nodes in such a way.) Moreover, some optimizations are highly sensitive to the presence of calls. (Think of the consequences of a call scheduled inside a heavily vectorized loop.) Macro-expansion also suffers from some of those issues, but still IMO an explicit `Call` node is a more appropriate solution to the problem. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2885142373 From kvn at openjdk.org Thu May 15 22:19:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 15 May 2025 22:19:52 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used In-Reply-To: References: Message-ID: On Thu, 15 May 2025 21:42:06 GMT, Doug Simon wrote: > Basically libjvmci doesn't require the existence of jdk.internal.vm.ci on the HotSpot side since it has effectively compiled that into itself. So we are decoupling the ability to use JVMCI from the presence of the JVMCI module. That should be in PR and RFE (JBS) Descriptions! This was my main question about filed REF. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25240#issuecomment-2885172971 From kvn at openjdk.org Thu May 15 22:26:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 15 May 2025 22:26:55 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used In-Reply-To: References: Message-ID: <5b9XLWfFY9pJ-y1fQ7FkuLSrvpFbe4hOyGOxdjPMxKw=.2c969cbd-e488-4db8-af81-ed2053d00b5d@github.com> On Wed, 14 May 2025 22:00:30 GMT, Doug Simon wrote: > The `EnableJVMCI` flag currently serves 2 purposes: > * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). > * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. > > This PR changes nothing about the first point. > > On the second point, to use the `jdk.internal.vm.ci` module, it must now be explicitly added with `--add-modules=jdk.internal.vm.ci`, which will also set `EnableJVMCI` as a side-effect. > > The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on an archive of the root module set created in a separate JVM execution. If the root module set is different than what's in the archive at runtime, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved when creating the archive, it must not be resolved in the runtime using the archive. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci` for libgraal to have the startup advantages of AOTClassLinking. src/hotspot/share/runtime/arguments.cpp line 1808: > 1806: // is no other representation of the jdk.internal.vm.ci module > 1807: // so it needs to be added to the root module set. > 1808: if (ClassLoader::is_module_observable("jdk.internal.vm.ci") && !UseJVMCINativeLibrary && !_jvmci_module_added) { In which case `ClassLoader::is_module_observable("jdk.internal.vm.ci")` == `true` when `_jvmci_module_added` == `false`? I assume this code is executed after check command line for `--add-modules=jdk.internal.vm.ci`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2092040772 From sviswanathan at openjdk.org Fri May 16 00:32:51 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 16 May 2025 00:32:51 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v5] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 00:36:26 GMT, Sandhya Viswanathan wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Responding to comments by Sandhya. > > Another minor comment. Rest of the PR looks good to me. > @sviswa7, thanks a lot for the review! If you agree with my changes to load the constants using broadcasting instructions instead of full AVX register loads, would you be so kind as to approve the PR and sponsor my integration? The broadcast instructions look good. I only have one query on montMul above that I have wondering about. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24953#issuecomment-2885339535 From sviswanathan at openjdk.org Fri May 16 00:32:53 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 16 May 2025 00:32:53 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v6] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 13:33:42 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Response to review comment + loading constants with broadcast op. src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 250: > 248: static void montmul(int outputRegs[], int inputRegs1[], int inputRegs2[], > 249: int scratchRegs1[], int scratchRegs2[], MacroAssembler *_masm) { > 250: for (int i = 0; i < 4; i++) { In the intrinsic for montMul we are treating as if MONT_R_BITS is 16 and MONT_Q_INV_MOD_R is 0xF301 whereas in the Java code MONT_R_BITS is 20 and MONT_Q_INT_MOD_R is 0x8F301. Are these equivalent? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2092137164 From dnsimon at openjdk.org Fri May 16 06:55:52 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 06:55:52 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used In-Reply-To: <5b9XLWfFY9pJ-y1fQ7FkuLSrvpFbe4hOyGOxdjPMxKw=.2c969cbd-e488-4db8-af81-ed2053d00b5d@github.com> References: <5b9XLWfFY9pJ-y1fQ7FkuLSrvpFbe4hOyGOxdjPMxKw=.2c969cbd-e488-4db8-af81-ed2053d00b5d@github.com> Message-ID: On Thu, 15 May 2025 22:24:19 GMT, Vladimir Kozlov wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module, it must now be explicitly added with `--add-modules=jdk.internal.vm.ci`, which will also set `EnableJVMCI` as a side-effect. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on an archive of the root module set created in a separate JVM execution. If the root module set is different than what's in the archive at runtime, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved when creating the archive, it must not be resolved in the runtime using the archive. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci` for libgraal to have the startup advantages of AOTClassLinking. > > src/hotspot/share/runtime/arguments.cpp line 1808: > >> 1806: // is no other representation of the jdk.internal.vm.ci module >> 1807: // so it needs to be added to the root module set. >> 1808: if (ClassLoader::is_module_observable("jdk.internal.vm.ci") && !UseJVMCINativeLibrary && !_jvmci_module_added) { > > In which case `ClassLoader::is_module_observable("jdk.internal.vm.ci")` == `true` when `_jvmci_module_added` == `false`? > I assume this code is executed after check command line for `--add-modules=jdk.internal.vm.ci`. The documentation for `is_module_observable` is: // Determines if the named module is present in the // modules jimage file or in the exploded modules directory. That is, is the module present on disk. On the other hand, `_jvmci_module_added` is a test of whether an `--add-modules` option has been seen whose value included `jdk.internal.vm.ci`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2092445492 From dnsimon at openjdk.org Fri May 16 07:03:57 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 07:03:57 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used In-Reply-To: References: Message-ID: On Thu, 15 May 2025 22:17:42 GMT, Vladimir Kozlov wrote: > That should be in PR and RFE (JBS) Descriptions! This was my main question about filed REF. I've updated both the PR and JBS issue descriptions. Let me know if either still need improvement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25240#issuecomment-2885829282 From alanb at openjdk.org Fri May 16 08:24:52 2025 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 16 May 2025 08:24:52 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used In-Reply-To: References: Message-ID: On Wed, 14 May 2025 22:00:30 GMT, Doug Simon wrote: > The `EnableJVMCI` flag currently serves 2 purposes: > * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). > * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. > > This PR changes nothing about the first point. > > On the second point, to use the `jdk.internal.vm.ci` module, it must now be explicitly added with `--add-modules=jdk.internal.vm.ci`, which will also set `EnableJVMCI` as a side-effect. > > The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on an archive of the root module set created in a separate JVM execution. If the root module set is different than what's in the archive at runtime, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved when creating the archive, it must not be resolved in the runtime using the archive. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci` for libgraal to have the startup advantages of AOTClassLinking. src/hotspot/share/runtime/arguments.cpp line 1811: > 1809: jio_fprintf(defaultStream::error_stream(), > 1810: "'+EnableJVMCI' requires '--add-modules=jdk.internal.vm.ci' when UseJVMCINativeLibrary is false\n"); > 1811: return false; There's something a bit uncomfortable about an error message naming a JDK internal module to specify to --add-modules. If I understand correctly, +EnableJVMCI and libgraal is all good, the set of modules in the training run is the same as the production run. However, in the no libgraal scenario, and a mismatch between training and production runs (is that right)? then AOT is disabled. Is it really terrible to disable the AOTClassLinking optimizations in that scenario? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2092577889 From dnsimon at openjdk.org Fri May 16 08:36:55 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 08:36:55 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used In-Reply-To: References: Message-ID: On Fri, 16 May 2025 08:22:20 GMT, Alan Bateman wrote: > the set of modules in the training run is the same as the production run I don't think that's true. That is, I don't think +EnableJVMCI is used in the training run is it @iklam ? > Is it really terrible to disable the AOTClassLinking optimizations in that scenario? Depends if you care as much about VM startup when using libgraal as when using C2. Is there a reason why only one of these should have AOTClassLinking startup benefits? There's also the issue of all the AOTClassLinking tests having to be disabled/ignored/problem listed in the libgraal mach5 tiers. This is what initially motivated @iklam and I to come up with this solution. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2092598097 From alanb at openjdk.org Fri May 16 09:01:52 2025 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 16 May 2025 09:01:52 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used In-Reply-To: References: Message-ID: On Fri, 16 May 2025 08:34:33 GMT, Doug Simon wrote: > Depends if you care as much about VM startup when using libgraal as when using C2. Is there a reason why only one of these should have AOTClassLinking startup benefits? I should have been clearer, my question/comment was about the no-libgraal case, not the libgraal case. With the proposed change, I think you are looking to print an error. I'm wondering why it can't continue to add jdk.internal.vm.ci to the set of root modules. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2092640951 From dnsimon at openjdk.org Fri May 16 09:32:57 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 09:32:57 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used In-Reply-To: References: Message-ID: On Fri, 16 May 2025 08:59:12 GMT, Alan Bateman wrote: >>> the set of modules in the training run is the same as the production run >> >> I don't think that's true. That is, I don't think +EnableJVMCI is used in the training run is it @iklam ? >> >>> Is it really terrible to disable the AOTClassLinking optimizations in that scenario? >> >> Depends if you care as much about VM startup when using libgraal as when using C2. Is there a reason why only one of these should have AOTClassLinking startup benefits? >> >> There's also the issue of all the AOTClassLinking tests having to be disabled/ignored/problem listed in the libgraal mach5 tiers. This is what initially motivated @iklam and I to come up with this solution. > >> Depends if you care as much about VM startup when using libgraal as when using C2. Is there a reason why only one of these should have AOTClassLinking startup benefits? > > I should have been clearer, my question/comment was about the no-libgraal case, not the libgraal case. With the proposed change, I think you are looking to print an error. I'm wondering why it can't continue to add jdk.internal.vm.ci to the set of root modules. Ok, that's a good suggestion. I'll explore further. And I now understand your first comment about "+EnableJVMCI and libgraal is all good" is in the context of having applied this PR. In that context, the set of modules in the training run is indeed the same as the production run. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2092699986 From dnsimon at openjdk.org Fri May 16 13:16:30 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 13:16:30 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v2] In-Reply-To: References: Message-ID: <5f7JDzMyIpKD6FAvCN5kYPJYD1mPUcAHUa43Kh74h40=.9d9fcdbe-68f5-46e1-9609-e3320bba6f77@github.com> > The `EnableJVMCI` flag currently serves 2 purposes: > * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). > * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. > > This PR changes nothing about the first point. > > On the second point, to use the `jdk.internal.vm.ci` module, it must now be explicitly added with `--add-modules=jdk.internal.vm.ci`, which will also set `EnableJVMCI` as a side-effect. > > The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on an archive of the root module set created in a separate JVM execution. If the root module set is different than what's in the archive at runtime, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved when creating the archive, it must not be resolved in the runtime using the archive. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci` for libgraal to have the startup advantages of AOTClassLinking. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: resolve jdk.internal.vm.ci if +EnableJVMCI and -UseJVMCINativeLibrary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25240/files - new: https://git.openjdk.org/jdk/pull/25240/files/0e8773e1..34360331 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25240&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25240&range=00-01 Stats: 38 lines in 9 files changed: 5 ins; 15 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/25240.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25240/head:pull/25240 PR: https://git.openjdk.org/jdk/pull/25240 From dnsimon at openjdk.org Fri May 16 13:16:30 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 13:16:30 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v2] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 09:30:42 GMT, Doug Simon wrote: >>> Depends if you care as much about VM startup when using libgraal as when using C2. Is there a reason why only one of these should have AOTClassLinking startup benefits? >> >> I should have been clearer, my question/comment was about the no-libgraal case, not the libgraal case. With the proposed change, I think you are looking to print an error. I'm wondering why it can't continue to add jdk.internal.vm.ci to the set of root modules. > > Ok, that's a good suggestion. I'll explore further. > > And I now understand your first comment about "+EnableJVMCI and libgraal is all good" is in the context of having applied this PR. In that context, the set of modules in the training run is indeed the same as the production run. I've pushed a commit that implements your suggestion and reduces the size of the overall change nicely. More importantly, I think it's a better design - thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2093025755 From iklam at openjdk.org Fri May 16 13:52:03 2025 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 16 May 2025 13:52:03 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v2] In-Reply-To: References: Message-ID: <-uhFTQkTUeNHKS5yBLkapWVfcGwDBAgS8B_rS2DvWsg=.e0a7243c-9853-4855-a652-2558941bfd41@github.com> On Fri, 16 May 2025 13:13:09 GMT, Doug Simon wrote: >> Ok, that's a good suggestion. I'll explore further. >> >> And I now understand your first comment about "+EnableJVMCI and libgraal is all good" is in the context of having applied this PR. In that context, the set of modules in the training run is indeed the same as the production run. > > I've pushed a commit that implements your suggestion and reduces the size of the overall change nicely. More importantly, I think it's a better design - thanks! I like this latest version! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2093092231 From iklam at openjdk.org Fri May 16 13:52:05 2025 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 16 May 2025 13:52:05 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v2] In-Reply-To: <5f7JDzMyIpKD6FAvCN5kYPJYD1mPUcAHUa43Kh74h40=.9d9fcdbe-68f5-46e1-9609-e3320bba6f77@github.com> References: <5f7JDzMyIpKD6FAvCN5kYPJYD1mPUcAHUa43Kh74h40=.9d9fcdbe-68f5-46e1-9609-e3320bba6f77@github.com> Message-ID: <9WobEXbqfiR1CrUzBbqo6G4sIBpMkDPduREkTuLLl3k=.59ee81b0-4d11-4bfd-8f50-d63d1f6ff35e@github.com> On Fri, 16 May 2025 13:16:30 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `--add-modules=jdk.internal.vm.ci` must be specified. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > resolve jdk.internal.vm.ci if +EnableJVMCI and -UseJVMCINativeLibrary src/hotspot/share/runtime/arguments.cpp line 1814: > 1812: } > 1813: PropertyList_unique_add(&_system_properties, "jdk.internal.vm.ci.enabled", "true", > 1814: AddProperty, UnwriteableProperty, InternalProperty); What's the purpose of the `jdk.internal.vm.ci.enabled` property? Should it be enabled only if the `jdk.internal.vm.ci` module is added? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2093091052 From dnsimon at openjdk.org Fri May 16 14:30:55 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 14:30:55 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v2] In-Reply-To: <9WobEXbqfiR1CrUzBbqo6G4sIBpMkDPduREkTuLLl3k=.59ee81b0-4d11-4bfd-8f50-d63d1f6ff35e@github.com> References: <5f7JDzMyIpKD6FAvCN5kYPJYD1mPUcAHUa43Kh74h40=.9d9fcdbe-68f5-46e1-9609-e3320bba6f77@github.com> <9WobEXbqfiR1CrUzBbqo6G4sIBpMkDPduREkTuLLl3k=.59ee81b0-4d11-4bfd-8f50-d63d1f6ff35e@github.com> Message-ID: On Fri, 16 May 2025 13:48:45 GMT, Ioi Lam wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> resolve jdk.internal.vm.ci if +EnableJVMCI and -UseJVMCINativeLibrary > > src/hotspot/share/runtime/arguments.cpp line 1814: > >> 1812: } >> 1813: PropertyList_unique_add(&_system_properties, "jdk.internal.vm.ci.enabled", "true", >> 1814: AddProperty, UnwriteableProperty, InternalProperty); > > What's the purpose of the `jdk.internal.vm.ci.enabled` property? Should it be enabled only if the `jdk.internal.vm.ci` module is added? It exists to [check](https://github.com/search?q=repo%3Aopenjdk%2Fjdk%20checkJVMCIEnabled&type=code) in various Java-level entry points that the JVMCI VM support has been enabled so a nicer error message can be thrown that provides a possible corrective action. However, since it's now impossible to load the JVMCI module without enabling the JVMCI VM support, this is no longer of any use. I'll remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2093162767 From dnsimon at openjdk.org Fri May 16 14:41:31 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 14:41:31 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v3] In-Reply-To: References: Message-ID: <_q4IPJYDVzhjP8W0KqTeFGzTB4vE-QmfAWAsLWd8m5M=.f0bd3074-ab48-425f-b6d4-e765f6d1f8f0@github.com> > The `EnableJVMCI` flag currently serves 2 purposes: > * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). > * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. > > This PR changes nothing about the first point. > > On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `--add-modules=jdk.internal.vm.ci` must be specified. > If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. > > The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. Doug Simon has updated the pull request incrementally with three additional commits since the last revision: - fixed comment - removed use of jdk.internal.vm.ci.enabled property - fix TestHotSpotJVMCIRuntime ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25240/files - new: https://git.openjdk.org/jdk/pull/25240/files/34360331..3cdef586 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25240&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25240&range=01-02 Stats: 20 lines in 5 files changed: 2 ins; 18 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25240.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25240/head:pull/25240 PR: https://git.openjdk.org/jdk/pull/25240 From iklam at openjdk.org Fri May 16 15:18:51 2025 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 16 May 2025 15:18:51 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v3] In-Reply-To: <-uhFTQkTUeNHKS5yBLkapWVfcGwDBAgS8B_rS2DvWsg=.e0a7243c-9853-4855-a652-2558941bfd41@github.com> References: <-uhFTQkTUeNHKS5yBLkapWVfcGwDBAgS8B_rS2DvWsg=.e0a7243c-9853-4855-a652-2558941bfd41@github.com> Message-ID: On Fri, 16 May 2025 13:49:26 GMT, Ioi Lam wrote: >> I've pushed a commit that implements your suggestion and reduces the size of the overall change nicely. More importantly, I think it's a better design - thanks! > > I like this latest version! I ran a recent build of Oracle JDK 25 that has libjvmcicompiler.so (not including your changes): $ ./bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT -XX:+PrintFlagsFinal --version | \ egrep '(EnableJVMCI)|(UseJVMCICompiler)|(UseJVMCINativeLibrary)' bool EnableJVMCI = true {JVMCI product} {default} bool EnableJVMCIProduct = true {JVMCI product} {command line} bool UseJVMCICompiler = true {JVMCI product} {default} bool UseJVMCINativeLibrary = true {JVMCI product} {default} $ ./bin/java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+PrintFlagsFinal --version | \ egrep '(EnableJVMCI)|(UseJVMCICompiler)|(UseJVMCINativeLibrary)' bool EnableJVMCI = true {JVMCI experimental} {command line} bool EnableJVMCIProduct = false {JVMCI experimental} {default} bool UseJVMCICompiler = false {JVMCI experimental} {default} bool UseJVMCINativeLibrary = true {JVMCI experimental} {default} So If you specify only `-XX:+EnableJVMCI` in the command-line, `UseJVMCINativeLibrary` will be true. As a result, with your latest version, the `jdk.internal.vm.ci` module is not added. If you have an app that wants to use the jdk.internal.vm.ci API, you must specify both `-XX:+EnableJVMCI` and ` --add-modules=jdk.internal.vm.ci`. Is this intentional? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2093246382 From dnsimon at openjdk.org Fri May 16 15:28:53 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 15:28:53 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v3] In-Reply-To: References: <-uhFTQkTUeNHKS5yBLkapWVfcGwDBAgS8B_rS2DvWsg=.e0a7243c-9853-4855-a652-2558941bfd41@github.com> Message-ID: On Fri, 16 May 2025 15:16:15 GMT, Ioi Lam wrote: > If you have an app that wants to use the jdk.internal.vm.ci API, you must specify both -XX:+EnableJVMCI and --add-modules=jdk.internal.vm.ci. You should only have to specify `--add-modules=jdk.internal.vm.ci` and that now sets `+EnableJVMCI`. If you also want libgraal to be used as the JIT (instead of C2), then you need to add `-XX:+UseGraalJIT`. For the Truffle on Oracle JDK, this means: * `--add-modules=jdk.internal.vm.ci`: Use C2 for JIT ("hosted") compilation and libgraal for Truffle ("guest") compilation * `--add-modules=jdk.internal.vm.ci -XX:+UseGraalJIT`: Use libgraal for both JIT and Truffle compilation ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2093262720 From iklam at openjdk.org Fri May 16 15:42:54 2025 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 16 May 2025 15:42:54 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v3] In-Reply-To: <_q4IPJYDVzhjP8W0KqTeFGzTB4vE-QmfAWAsLWd8m5M=.f0bd3074-ab48-425f-b6d4-e765f6d1f8f0@github.com> References: <_q4IPJYDVzhjP8W0KqTeFGzTB4vE-QmfAWAsLWd8m5M=.f0bd3074-ab48-425f-b6d4-e765f6d1f8f0@github.com> Message-ID: On Fri, 16 May 2025 14:41:31 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `--add-modules=jdk.internal.vm.ci` must be specified. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. > > Doug Simon has updated the pull request incrementally with three additional commits since the last revision: > > - fixed comment > - removed use of jdk.internal.vm.ci.enabled property > - fix TestHotSpotJVMCIRuntime Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25240#pullrequestreview-2847006395 From iklam at openjdk.org Fri May 16 15:42:56 2025 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 16 May 2025 15:42:56 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v3] In-Reply-To: References: <-uhFTQkTUeNHKS5yBLkapWVfcGwDBAgS8B_rS2DvWsg=.e0a7243c-9853-4855-a652-2558941bfd41@github.com> Message-ID: <2I_zoUvs4eHXqSJAdRTmEwTcSr58O1eFo90vfacZuz8=.907e37b4-4c81-4dbc-884c-0e05b0ea3024@github.com> On Fri, 16 May 2025 15:26:31 GMT, Doug Simon wrote: >> I ran a recent build of Oracle JDK 25 that has libjvmcicompiler.so (not including your changes): >> >> >> $ ./bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT -XX:+PrintFlagsFinal --version | \ >> egrep '(EnableJVMCI)|(UseJVMCICompiler)|(UseJVMCINativeLibrary)' >> bool EnableJVMCI = true {JVMCI product} {default} >> bool EnableJVMCIProduct = true {JVMCI product} {command line} >> bool UseJVMCICompiler = true {JVMCI product} {default} >> bool UseJVMCINativeLibrary = true {JVMCI product} {default} >> $ ./bin/java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+PrintFlagsFinal --version | \ >> egrep '(EnableJVMCI)|(UseJVMCICompiler)|(UseJVMCINativeLibrary)' >> bool EnableJVMCI = true {JVMCI experimental} {command line} >> bool EnableJVMCIProduct = false {JVMCI experimental} {default} >> bool UseJVMCICompiler = false {JVMCI experimental} {default} >> bool UseJVMCINativeLibrary = true {JVMCI experimental} {default} >> >> >> So If you specify only `-XX:+EnableJVMCI` in the command-line, `UseJVMCINativeLibrary` will be true. As a result, with your latest version, the `jdk.internal.vm.ci` module is not added. >> >> If you have an app that wants to use the jdk.internal.vm.ci API, you must specify both `-XX:+EnableJVMCI` and ` >> --add-modules=jdk.internal.vm.ci`. Is this intentional? > >> If you have an app that wants to use the jdk.internal.vm.ci API, you must specify both -XX:+EnableJVMCI and --add-modules=jdk.internal.vm.ci. > > You should only have to specify `--add-modules=jdk.internal.vm.ci` and that now sets `+EnableJVMCI`. If you also want libgraal to be used as the JIT (instead of C2), then you need to add `-XX:+UseGraalJIT`. > > For the Truffle on Oracle JDK, this means: > * `--add-modules=jdk.internal.vm.ci`: Use C2 for JIT ("hosted") compilation and libgraal for Truffle ("guest") compilation > * `--add-modules=jdk.internal.vm.ci -XX:+UseGraalJIT`: Use libgraal for both JIT and Truffle compilation Ah I forgot the setting on EnableJVMCI to true. Thanks for the explanation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2093283370 From dnsimon at openjdk.org Fri May 16 16:50:38 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 16:50:38 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v4] In-Reply-To: References: Message-ID: > The `EnableJVMCI` flag currently serves 2 purposes: > * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). > * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. > > This PR changes nothing about the first point. > > On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `--add-modules=jdk.internal.vm.ci` must be specified. > If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. > > The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: improved error message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25240/files - new: https://git.openjdk.org/jdk/pull/25240/files/3cdef586..1fe56b41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25240&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25240&range=02-03 Stats: 6 lines in 3 files changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25240.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25240/head:pull/25240 PR: https://git.openjdk.org/jdk/pull/25240 From dnsimon at openjdk.org Fri May 16 16:50:38 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 16:50:38 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v3] In-Reply-To: <_q4IPJYDVzhjP8W0KqTeFGzTB4vE-QmfAWAsLWd8m5M=.f0bd3074-ab48-425f-b6d4-e765f6d1f8f0@github.com> References: <_q4IPJYDVzhjP8W0KqTeFGzTB4vE-QmfAWAsLWd8m5M=.f0bd3074-ab48-425f-b6d4-e765f6d1f8f0@github.com> Message-ID: On Fri, 16 May 2025 14:41:31 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `--add-modules=jdk.internal.vm.ci` must be specified. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. > > Doug Simon has updated the pull request incrementally with three additional commits since the last revision: > > - fixed comment > - removed use of jdk.internal.vm.ci.enabled property > - fix TestHotSpotJVMCIRuntime While testing this out on Graal, I discovered an interesting corner case. public class UseJVMCIModule { public static void main(String[] args) { jdk.vm.ci.runtime.JVMCI.getRuntime(); } } If the JVMCI module is indirectly added to the root module set, it results in an error: java --add-modules=jdk.graal.compiler --add-exports=jdk.internal.vm.ci/jdk.vm.ci.runtime=ALL-UNNAMED UseJVMCIModule.java Exception in thread "main" java.lang.InternalError: JVMCI is not enabled at jdk.internal.vm.ci/jdk.vm.ci.runtime.JVMCI.initializeRuntime(Native Method) at jdk.internal.vm.ci/jdk.vm.ci.runtime.JVMCI.getRuntime(JVMCI.java:64) at UseJVMCIModule.main(UseJVMCIModule.java:3) That is, if an app wants to use the JVMCI module, it needs to explicitly communicate this to the launcher. By the time the root module graph is being initialized in ModuleBootstrap, it's too late to set `EnableJVMCI`. I improved the error message to make this clear: Exception in thread "main" java.lang.InternalError: JVMCI is not enabled. Must specify '--add-modules=jdk.internal.vm.ci' or '-XX:+EnableJVMCI' to the java launcher. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25240#issuecomment-2887219662 From never at openjdk.org Fri May 16 19:19:52 2025 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 16 May 2025 19:19:52 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v4] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 16:50:38 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `--add-modules=jdk.internal.vm.ci` must be specified. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > improved error message src/hotspot/share/jvmci/jvmciRuntime.hpp line 38: > 36: #endif // INCLUDE_G1GC > 37: > 38: #define JAVA_NOT_ENABLED_ERROR_MESSAGE "JVMCI is not enabled. Must specify '--add-modules=jdk.internal.vm.ci' or '-XX:+EnableJVMCI' to the java launcher." You meant `JVMCI_NOT_ENABLED_ERROR_MESSAGE` I assume? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2093564087 From never at openjdk.org Fri May 16 19:35:54 2025 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 16 May 2025 19:35:54 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v4] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 16:50:38 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `--add-modules=jdk.internal.vm.ci` must be specified. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > improved error message test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/src/jdk/vm/ci/hotspot/test/TestHotSpotJVMCIRuntime.java line 173: > 171: "-XX:+UnlockExperimentalVMOptions", > 172: "-XX:+EnableJVMCI", > 173: "--add-modules=jdk.internal.vm.ci", I stared at this for a while to understand why passing this option was required. It's a bit confusing that explicitly passing `-XX:+EnableJVMCI` has different effects based on the value of UseJVMCINativeLibrary. I think that if `EnableJVMCI` is passed on the command line then it should add the module even if libgraal is in use. So something like: `if ((!UseJVMCINativeLibrary || FLAG_IS_CMDLINE(EnableJVMCI) && ClassLoader::is_module_observable` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2093580679 From dnsimon at openjdk.org Fri May 16 19:46:51 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 19:46:51 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v4] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 19:17:42 GMT, Tom Rodriguez wrote: > You meant `JVMCI_NOT_ENABLED_ERROR_MESSAGE` I assume? Ha! Nice catch ;-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2093593061 From dnsimon at openjdk.org Fri May 16 19:56:52 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 19:56:52 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v4] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 19:33:25 GMT, Tom Rodriguez wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> improved error message > > test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/src/jdk/vm/ci/hotspot/test/TestHotSpotJVMCIRuntime.java line 173: > >> 171: "-XX:+UnlockExperimentalVMOptions", >> 172: "-XX:+EnableJVMCI", >> 173: "--add-modules=jdk.internal.vm.ci", > > I stared at this for a while to understand why passing this option was required. It's a bit confusing that explicitly passing `-XX:+EnableJVMCI` has different effects based on the value of UseJVMCINativeLibrary. I think that if `EnableJVMCI` is passed on the command line then it should add the module even if libgraal is in use. So something like: > `if ((!UseJVMCINativeLibrary || FLAG_IS_CMDLINE(EnableJVMCI) && ClassLoader::is_module_observable` I was not aware FLAG_IS_CMDLINE can be used for altering the semantics of a flag but there seems to be at least one precedent for it with [UseCompactObjectHeaders](https://github.com/openjdk/jdk/blob/3dd34517000e4ce1a21619922c62c025f98aad44/src/hotspot/share/runtime/arguments.cpp#L3671). This is quite nice as now nothing needs to change for Truffle users in terms of enabling the Truffle optimized runtime (cc @chumer). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2093603624 From dnsimon at openjdk.org Fri May 16 21:06:39 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 May 2025 21:06:39 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v5] In-Reply-To: References: Message-ID: > The `EnableJVMCI` flag currently serves 2 purposes: > * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). > * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. > > This PR changes nothing about the first point. > > On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `--add-modules=jdk.internal.vm.ci` must be specified. > If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. > > The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - load the JVMCI module if +EnableJVMCI is set on the command line - JAVA_NOT_ENABLED_ERROR_MESSAGE -> JVMCI_NOT_ENABLED_ERROR_MESSAGE ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25240/files - new: https://git.openjdk.org/jdk/pull/25240/files/1fe56b41..d9223afb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25240&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25240&range=03-04 Stats: 15 lines in 6 files changed: 0 ins; 4 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/25240.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25240/head:pull/25240 PR: https://git.openjdk.org/jdk/pull/25240 From never at openjdk.org Fri May 16 21:20:52 2025 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 16 May 2025 21:20:52 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v5] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 21:06:39 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `--add-modules=jdk.internal.vm.ci` must be specified. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - load the JVMCI module if +EnableJVMCI is set on the command line > - JAVA_NOT_ENABLED_ERROR_MESSAGE -> JVMCI_NOT_ENABLED_ERROR_MESSAGE new version seems nice and clean. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25240#pullrequestreview-2847639000 From dnsimon at openjdk.org Sat May 17 09:40:54 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 17 May 2025 09:40:54 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v3] In-Reply-To: References: <_q4IPJYDVzhjP8W0KqTeFGzTB4vE-QmfAWAsLWd8m5M=.f0bd3074-ab48-425f-b6d4-e765f6d1f8f0@github.com> Message-ID: On Fri, 16 May 2025 15:40:12 GMT, Ioi Lam wrote: >> Doug Simon has updated the pull request incrementally with three additional commits since the last revision: >> >> - fixed comment >> - removed use of jdk.internal.vm.ci.enabled property >> - fix TestHotSpotJVMCIRuntime > > Marked as reviewed by iklam (Reviewer). Any further feedback or concerns @iklam or @AlanBateman ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25240#issuecomment-2888253695 From iklam at openjdk.org Sun May 18 05:54:06 2025 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 18 May 2025 05:54:06 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v5] In-Reply-To: References: Message-ID: <41K4xsn27SKncEkQLqryRwgvoLwrlRTfobgQDGOO0Dg=.55866213-f3b4-44ab-9a1e-80a83733d06b@github.com> On Fri, 16 May 2025 21:06:39 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `-XX:+EnableJVMCI` must be explicitly specified on the command line (as opposed to being true as a result of [`-XX:+UseJVMCICompiler`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L88) or [`-XX:+EnableJVMCIProduct`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L64)). Alternatively, `--add-modules=jdk.internal.vm.ci` can be specified - it has the same semantics as `-XX:+EnableJVMCI`. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - load the JVMCI module if +EnableJVMCI is set on the command line > - JAVA_NOT_ENABLED_ERROR_MESSAGE -> JVMCI_NOT_ENABLED_ERROR_MESSAGE Latest version looks good to me. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25240#pullrequestreview-2848736472 From dnsimon at openjdk.org Sun May 18 19:14:06 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 18 May 2025 19:14:06 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v6] In-Reply-To: References: Message-ID: > The `EnableJVMCI` flag currently serves 2 purposes: > * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). > * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. > > This PR changes nothing about the first point. > > On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `-XX:+EnableJVMCI` must be explicitly specified on the command line (as opposed to being true as a result of [`-XX:+UseJVMCICompiler`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L88) or [`-XX:+EnableJVMCIProduct`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L64)). Alternatively, `--add-modules=jdk.internal.vm.ci` can be specified - it has the same semantics as `-XX:+EnableJVMCI`. > If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. > > The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: load the JVMCI module if +EnableJVMCI is set in the jimage ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25240/files - new: https://git.openjdk.org/jdk/pull/25240/files/d9223afb..196425f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25240&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25240&range=04-05 Stats: 9 lines in 2 files changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25240.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25240/head:pull/25240 PR: https://git.openjdk.org/jdk/pull/25240 From dnsimon at openjdk.org Sun May 18 19:19:52 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 18 May 2025 19:19:52 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v6] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 19:14:06 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `-XX:+EnableJVMCI` must be explicitly specified to the launcher (as opposed to being true as a result of [`-XX:+UseJVMCICompiler`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L88) or [`-XX:+EnableJVMCIProduct`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L64)). Alternatively, `--add-modules=jdk.internal.vm.ci` can be specified - it has the same semantics as `-XX:+EnableJVMCI`. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > load the JVMCI module if +EnableJVMCI is set in the jimage In addition to loading the JVMCI module when `-XX:+EnableJVMCI` is on the command line, it should also be done when `-XX:+EnableJVMCI` is set by the jimage. The latter is how GraalVM sets some defaults and +EnableJVMCI is such a default. This ensures that the root module set is the same in training and production runs for AOTClassLinking on GraalVM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25240#issuecomment-2889161533 From mchevalier at openjdk.org Mon May 19 07:01:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 19 May 2025 07:01:52 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: <7e0IhYYv_1dDlLgmUM8rKj5bjDx3lIhY2PRt-fC-rTs=.35437a80-80c7-4332-9339-a6f047b73289@github.com> References: <7e0IhYYv_1dDlLgmUM8rKj5bjDx3lIhY2PRt-fC-rTs=.35437a80-80c7-4332-9339-a6f047b73289@github.com> Message-ID: On Tue, 13 May 2025 03:12:29 GMT, Quan Anh Mai wrote: >>> I think a very simple approach you can take is having CallPureNode as a pure data node >> >> It's not as simple as it seems. In order to work reliably it requires full control of the code being called, so without extra work it is appropriate for generated stubs only. If you want to call some native code VM doesn't control, then either all caller-saved registers should be preserved across the call (which may be prohibitively expensive) or it should be made explicit there's a call taking place so all ABI effects are taken into account. > > @iwanowww I believe `effect(CALL)` marks that a call is taking place and the register allocator will know how to save the registers accordingly. Note that on arm, long division is implemented as a call: > > https://github.com/openjdk/jdk/blob/adebfa7ffda6383f5793278ced14a193066c5f6a/src/hotspot/cpu/arm/arm.ad#L5962 > > And `SharedRuntime::ldiv` is implemented in C++: > > https://github.com/openjdk/jdk/blob/adebfa7ffda6383f5793278ced14a193066c5f6a/src/hotspot/share/runtime/sharedRuntime.cpp#L272 I like @merykitty's suggestion, but I don't understand how bad are the disadvantages of it. Commoning can be prevented as you mentioned above. As for scheduling, isn't it the same problem for many nodes? If we have something like var x = anOject.aField; // anObject known to be not null if (flag) { // flag independent of `anObject` // something with x } else { // [...] nothing with x } I don't think there is any ordering between the if and the definition of `x`, and so we should push the latter under the if. And conversely, if the declaration is already in the branch in the original code, we should not let it float above. Or in case of loop, we should rather put it outside as much as possible. But none of that seems enforced by edges: memory node is not a CFG node, the nodes if the `if(flag)` might not use memory (so no memory edges)... The same would be true for an arithmetic node (like `AddI`, for instance), but we could argue those are cheap (even if in a loop, cheap becomes expensive), while a memory access is not that cheap. So, don't the problems we have with @merykitty's pure-call-as-pure-data-node suggestion already exist for other node kinds? And if we would have troubles with scheduling of pure calls, shouldn't we have this kind of issue already? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2889840427 From alanb at openjdk.org Mon May 19 07:21:53 2025 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 19 May 2025 07:21:53 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v6] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 19:14:06 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `-XX:+EnableJVMCI` must be explicitly specified to the launcher (as opposed to being true as a result of [`-XX:+UseJVMCICompiler`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L88) or [`-XX:+EnableJVMCIProduct`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L64)). Alternatively, `--add-modules=jdk.internal.vm.ci` can be specified - it has the same semantics as `-XX:+EnableJVMCI`. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. >> >> Graal adaption PR: https://github.com/oracle/graal/pull/11212 > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > load the JVMCI module if +EnableJVMCI is set in the jimage src/hotspot/share/runtime/arguments.cpp line 2264: > 2262: } > 2263: } > 2264: } This only works if jdk.internal.vm.ci is specified to --add-modules, it won't set _jvmci_module_added if jdk.internal.vm.ci is resolved because some other module require it. As the module is JDK internal and doesn't export an API then I assume it would be rare-to-never to require it, is that right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2095007839 From qamai at openjdk.org Mon May 19 07:24:01 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 19 May 2025 07:24:01 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: <4vbXpgvmXv6Ba1fEkMKIRpUnXZ-QVdAZ7rgicqxVhpM=.7dda802c-9b8a-459d-9bd7-7a83d9fc1744@github.com> References: <4vbXpgvmXv6Ba1fEkMKIRpUnXZ-QVdAZ7rgicqxVhpM=.7dda802c-9b8a-459d-9bd7-7a83d9fc1744@github.com> Message-ID: <_iOKkIEZDrhUNSnn4GshsjW79IzVkUyY31LozGq8fcI=.01ecf0ab-641a-427d-bb65-f657df4f49e4@github.com> On Thu, 15 May 2025 21:56:32 GMT, Vladimir Ivanov wrote: >> A first part toward a better support of pure functions. >> >> ## Pure Functions >> >> Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. >> >> ## Scope >> >> We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. >> >> ## Implementation Overview >> >> We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! >> >> IR framework and IGV needed a little bit of fixing. >> >> Thanks, >> Marc > > Interesting! I wasn't aware ADLC already features such support. Thanks for the pointers. > > It does look attractive, especially for platform-specific use cases. But there are some pitfalls which makes it hard to use on its own. In particular, data nodes are aggressively commoned and freely flow in the graph. Unless it is taken into account during GVN and code motion, the final schedule may end up far from optimal. (In other words, it's highly beneficial to match only expensive nodes in such a way.) Moreover, some optimizations are highly sensitive to the presence of calls. (Think of the consequences of a call scheduled inside a heavily vectorized loop.) > > Macro-expansion also suffers from some of those issues, but still IMO an explicit `Call` node is a more appropriate solution to the problem. Tbh I don't understand @iwanowww arguments. We have expensive data nodes such as `SqrtD` that have control inputs to prevent them floating too aggressively. Additionally, a `CallNode` is pinned AT its control input, while a data node is pinned UNDER its control input. It gives the scheduler much more freedom scheduling a data node to a better location compared to a call node. Ideally, what we want to do with expensive data nodes is to common them aggressively like any other data node. Then, during code motion, we can clone them if it is beneficial. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2889891820 From alanb at openjdk.org Mon May 19 07:29:01 2025 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 19 May 2025 07:29:01 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v6] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 19:14:06 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `-XX:+EnableJVMCI` must be explicitly specified to the launcher (as opposed to being true as a result of [`-XX:+UseJVMCICompiler`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L88) or [`-XX:+EnableJVMCIProduct`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L64)). Alternatively, `--add-modules=jdk.internal.vm.ci` can be specified - it has the same semantics as `-XX:+EnableJVMCI`. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. >> >> Graal adaption PR: https://github.com/oracle/graal/pull/11212 > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > load the JVMCI module if +EnableJVMCI is set in the jimage src/hotspot/share/jvmci/jvmciRuntime.hpp line 38: > 36: #endif // INCLUDE_G1GC > 37: > 38: #define JVMCI_NOT_ENABLED_ERROR_MESSAGE "JVMCI is not enabled. Must specify '--add-modules=jdk.internal.vm.ci' or '-XX:+EnableJVMCI' to the java launcher." It's the exception message for an InternalError so maybe this is okay but in general I don't think you want to be asking users to have to name a JDK internal module. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2095016026 From dnsimon at openjdk.org Mon May 19 07:29:04 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 19 May 2025 07:29:04 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v6] In-Reply-To: References: Message-ID: <1jsJPDBKXdqW9cWDKx8B_86qlqoo1QR1Bu93DC6KWGI=.ad77b0c1-9984-405a-ad2b-6391006748a9@github.com> On Mon, 19 May 2025 07:23:54 GMT, Alan Bateman wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> load the JVMCI module if +EnableJVMCI is set in the jimage > > src/hotspot/share/jvmci/jvmciRuntime.hpp line 38: > >> 36: #endif // INCLUDE_G1GC >> 37: >> 38: #define JVMCI_NOT_ENABLED_ERROR_MESSAGE "JVMCI is not enabled. Must specify '--add-modules=jdk.internal.vm.ci' or '-XX:+EnableJVMCI' to the java launcher." > > It's the exception message for an InternalError so maybe this is okay but in general I don't think you want to be asking users to have to name a JDK internal module. I think anyone using a non-standard configuration for Graal would not be surprised about JVMCI and would actually appreciate the extra hint when they get things "wrong". > src/hotspot/share/runtime/arguments.cpp line 2264: > >> 2262: } >> 2263: } >> 2264: } > > This only works if jdk.internal.vm.ci is specified to --add-modules, it won't set _jvmci_module_added if jdk.internal.vm.ci is resolved because some other module require it. As the module is JDK internal and doesn't export an API then I assume it would be rare-to-never to require it, is that right? Correct. I realized this might be a bit confusing so improved the error message a little: https://github.com/openjdk/jdk/pull/25240#issuecomment-2887219662 But as you say, it should never be encountered in practice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2095021472 PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2095012796 From dnsimon at openjdk.org Mon May 19 07:46:23 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 19 May 2025 07:46:23 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v7] In-Reply-To: References: Message-ID: > The `EnableJVMCI` flag currently serves 2 purposes: > * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). > * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. > > This PR changes nothing about the first point. > > On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `-XX:+EnableJVMCI` must be explicitly specified to the launcher (as opposed to being true as a result of [`-XX:+UseJVMCICompiler`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L88) or [`-XX:+EnableJVMCIProduct`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L64)). Alternatively, `--add-modules=jdk.internal.vm.ci` can be specified - it has the same semantics as `-XX:+EnableJVMCI`. > If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. > > The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. > > Graal adaption PR: https://github.com/oracle/graal/pull/11212 Doug Simon has updated the pull request incrementally with one additional commit since the last revision: swapped order of recommended options in error message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25240/files - new: https://git.openjdk.org/jdk/pull/25240/files/196425f9..b74077f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25240&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25240&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25240.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25240/head:pull/25240 PR: https://git.openjdk.org/jdk/pull/25240 From fniephaus at openjdk.org Mon May 19 07:46:23 2025 From: fniephaus at openjdk.org (Fabio Niephaus) Date: Mon, 19 May 2025 07:46:23 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v6] In-Reply-To: <1jsJPDBKXdqW9cWDKx8B_86qlqoo1QR1Bu93DC6KWGI=.ad77b0c1-9984-405a-ad2b-6391006748a9@github.com> References: <1jsJPDBKXdqW9cWDKx8B_86qlqoo1QR1Bu93DC6KWGI=.ad77b0c1-9984-405a-ad2b-6391006748a9@github.com> Message-ID: On Mon, 19 May 2025 07:26:01 GMT, Doug Simon wrote: >> src/hotspot/share/jvmci/jvmciRuntime.hpp line 38: >> >>> 36: #endif // INCLUDE_G1GC >>> 37: >>> 38: #define JVMCI_NOT_ENABLED_ERROR_MESSAGE "JVMCI is not enabled. Must specify '--add-modules=jdk.internal.vm.ci' or '-XX:+EnableJVMCI' to the java launcher." >> >> It's the exception message for an InternalError so maybe this is okay but in general I don't think you want to be asking users to have to name a JDK internal module. > > I think anyone using a non-standard configuration for Graal would not be surprised about JVMCI and would actually appreciate the extra hint when they get things "wrong". Assuming we want users to use `-XX:+EnableJVMCI` over `--add-modules=jdk.internal.vm.ci` for said reason, maybe flip the order? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2095053343 From dnsimon at openjdk.org Mon May 19 07:46:23 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 19 May 2025 07:46:23 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v6] In-Reply-To: References: <1jsJPDBKXdqW9cWDKx8B_86qlqoo1QR1Bu93DC6KWGI=.ad77b0c1-9984-405a-ad2b-6391006748a9@github.com> Message-ID: On Mon, 19 May 2025 07:40:45 GMT, Fabio Niephaus wrote: >> I think anyone using a non-standard configuration for Graal would not be surprised about JVMCI and would actually appreciate the extra hint when they get things "wrong". > > Assuming we want users to use `-XX:+EnableJVMCI` over `--add-modules=jdk.internal.vm.ci` for said reason, maybe flip the order? Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25240#discussion_r2095057877 From yzheng at openjdk.org Mon May 19 15:19:15 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 19 May 2025 15:19:15 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs Message-ID: This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray ------------- Commit messages: - address comments - Add JVMCI support for APX EGPRs Changes: https://git.openjdk.org/jdk/pull/23159/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23159&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334717 Stats: 546 lines in 18 files changed: 41 ins; 334 del; 171 mod Patch: https://git.openjdk.org/jdk/pull/23159.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23159/head:pull/23159 PR: https://git.openjdk.org/jdk/pull/23159 From yzheng at openjdk.org Mon May 19 15:19:15 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 19 May 2025 15:19:15 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs In-Reply-To: References: Message-ID: On Thu, 16 Jan 2025 16:01:32 GMT, Yudi Zheng wrote: > This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray keep alive ------------- PR Comment: https://git.openjdk.org/jdk/pull/23159#issuecomment-2724418056 From dnsimon at openjdk.org Mon May 19 15:19:15 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 19 May 2025 15:19:15 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs In-Reply-To: References: Message-ID: On Thu, 16 Jan 2025 16:01:32 GMT, Yudi Zheng wrote: > This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/code/RegisterAttributes.java line 58: > 56: * element at index i holds the attributes of the register whose number is i. > 57: */ > 58: public static RegisterAttributes[] createMap(RegisterConfig registerConfig, List registers) { We should remove raw arrays as much as possible in JVMCI and replace them with immutable Lists: * @return an immutable list whose length is the max register number in {@code registers} plus 1. An * element at index i holds the attributes of the register whose number is i. */ public static List registers) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23159#discussion_r2033171362 From yzheng at openjdk.org Mon May 19 15:19:15 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 19 May 2025 15:19:15 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 13:15:29 GMT, Doug Simon wrote: >> This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/code/RegisterAttributes.java line 58: > >> 56: * element at index i holds the attributes of the register whose number is i. >> 57: */ >> 58: public static RegisterAttributes[] createMap(RegisterConfig registerConfig, List registers) { > > We should remove raw arrays as much as possible in JVMCI and replace them with immutable Lists: > > * @return an immutable list whose length is the max register number in {@code registers} plus 1. An > * element at index i holds the attributes of the register whose number is i. > */ > public static List registers) { I have audited all the .clone() on array objects and changed as much as possible. Let me know if there is still some opportunity ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23159#discussion_r2095952012 From dnsimon at openjdk.org Mon May 19 15:26:53 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 19 May 2025 15:26:53 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs In-Reply-To: References: Message-ID: <_MgCOg5EY1Sa1zo0ZwQ5Xr8rU3cU5CI5GHxCwIHGoSo=.d90889b8-f821-4695-85cb-1c8727638e7a@github.com> On Mon, 19 May 2025 15:16:27 GMT, Yudi Zheng wrote: >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/code/RegisterAttributes.java line 58: >> >>> 56: * element at index i holds the attributes of the register whose number is i. >>> 57: */ >>> 58: public static RegisterAttributes[] createMap(RegisterConfig registerConfig, List registers) { >> >> We should remove raw arrays as much as possible in JVMCI and replace them with immutable Lists: >> >> * @return an immutable list whose length is the max register number in {@code registers} plus 1. An >> * element at index i holds the attributes of the register whose number is i. >> */ >> public static List registers) { > > I have audited all the .clone() on array objects and changed as much as possible. Let me know if there is still some opportunity Looks good - thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23159#discussion_r2095973654 From dnsimon at openjdk.org Mon May 19 16:50:51 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 19 May 2025 16:50:51 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs In-Reply-To: References: Message-ID: <6U3nTT2mClWCu8SHNL9JmMfwaKITOkvSmzI-3GAr-WY=.d51c748f-199c-4254-8555-15ab31ce78fd@github.com> On Thu, 16 Jan 2025 16:01:32 GMT, Yudi Zheng wrote: > This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray LGTM ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23159#pullrequestreview-2851438732 From never at openjdk.org Mon May 19 17:42:56 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 19 May 2025 17:42:56 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v7] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 07:46:23 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `-XX:+EnableJVMCI` must be explicitly specified to the launcher (as opposed to being true as a result of [`-XX:+UseJVMCICompiler`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L88) or [`-XX:+EnableJVMCIProduct`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L64)). Alternatively, `--add-modules=jdk.internal.vm.ci` can be specified - it has the same semantics as `-XX:+EnableJVMCI`. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. >> >> Graal adaption PR: https://github.com/oracle/graal/pull/11212 > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > swapped order of recommended options in error message Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25240#pullrequestreview-2851570977 From dnsimon at openjdk.org Mon May 19 17:56:01 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 19 May 2025 17:56:01 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 Message-ID: As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: Error occurred during initialization of VM java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. Instead of exiting the VM, the failure should be silent (unless `-XX:+PrintCompilation` is enabled) as the VM can continue without libgraal, albeit in a crippled state. This PR implements this solution. Alternative solutions include: 1. Trying to adjust the values used with `ulimit -v` in the tests to accommodate the [virtual address reservations](https://github.com/oracle/graal/blob/69f10d3d658a6aeca3d5ce59c64af6a18336f14c/substratevm/src/com.oracle.svm.core.genscavenge/src/com/oracle/svm/core/genscavenge/AddressRangeCommittedMemoryProvider.java#L150) needed by libgraal. This is brittle as it assumes knowledge about how much address space is needed (which is turn depends on how many libgraal compiler threads are created). 2. Add a `@requires !vm.libgraal.jit` guard to the tests so they are not run when libgraal is in use. I think the solution in this PR is the most robust for the long term. ------------- Commit messages: - do not exit VM if libjvmci env creation fails Changes: https://git.openjdk.org/jdk/pull/25307/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357135 Stats: 29 lines in 3 files changed: 9 ins; 17 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25307.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25307/head:pull/25307 PR: https://git.openjdk.org/jdk/pull/25307 From vlivanov at openjdk.org Tue May 20 03:29:54 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 20 May 2025 03:29:54 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc I'm just pointing out that delaying lowering decision till matching phase neither makes scheduling easier nor makes implementation simpler. For loop opts it is important to know when loops contain calls and act accordingly (by trying to hoist relevant nodes out of loops and disabling some optimizations when the calls are still there). The difference between CFG nodes effectively pinned AT some point and non-CFG nodes with control dependency (effectively pushing them UNDER their control input) becomes insignificant once CFG nodes depend solely on control. In other words, once a call node doesn't consume/produce memory and I/O states, it becomes straightforward to move it around in CFG when desired (between it's inputs and users). Speaking of scheduling, would default scheduling heuristics do a good job? The case of expensive nodes exemplifies the need of custom scheduling heuristics for such nodes. Implementation-wise, lowering during matching becomes platform-specific and requires each platform to introduce `effect(CALL)` AD instructions. Moreover, each call shape (determined by arity and argument kinds) has to be explicitly handled with a dedicated AD instruction. And it doesn't benefit from existing support of call nodes every platform already has. > Ideally, what we want to do with expensive data nodes is to common them aggressively like any other data node. Then, during code motion, we can clone them if it is beneficial. The current implementation of expensive nodes can definitely be improved, but the nice property it has is that it only decreases the number of nodes through careful commoning during loop opts. Once cloning is allowed, there's a new problem to care about: the case of too many clones. A simple incremental improvement would be to teach `PhaseIdealLoop::process_expensive_nodes()` to push expensive nodes closer to their users if they are on less frequent code paths. Then it can be taught (how and when) to clone expensive nodes between multiple users. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2892797262 From yzheng at openjdk.org Tue May 20 06:14:09 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 20 May 2025 06:14:09 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs [v2] In-Reply-To: References: Message-ID: > This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: fix tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23159/files - new: https://git.openjdk.org/jdk/pull/23159/files/aabb8996..37e4d2a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23159&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23159&range=00-01 Stats: 15 lines in 3 files changed: 3 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/23159.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23159/head:pull/23159 PR: https://git.openjdk.org/jdk/pull/23159 From duke at openjdk.org Tue May 20 11:56:52 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 20 May 2025 11:56:52 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v6] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 00:28:18 GMT, Sandhya Viswanathan wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Response to review comment + loading constants with broadcast op. > > src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 250: > >> 248: static void montmul(int outputRegs[], int inputRegs1[], int inputRegs2[], >> 249: int scratchRegs1[], int scratchRegs2[], MacroAssembler *_masm) { >> 250: for (int i = 0; i < 4; i++) { > > In the intrinsic for montMul we are treating as if MONT_R_BITS is 16 and MONT_Q_INV_MOD_R is 0xF301 whereas in the Java code MONT_R_BITS is 20 and MONT_Q_INT_MOD_R is 0x8F301. Are these equivalent? As used in this case, they are equivalent. For z = montmul(a,b), z will be between -q and q and congruent to a * b * R^-1 mod q, where R > 2 * q, R is a power of 2, -R/2 * q <= a * b < R/2 * q. For the Java code, we use R = 2^20 and for the intrinsic, R = 2^16. In our computations, b is always c * R mod q, so the montmul() really computes a * c mod q. In the Java code, we use 32-bit numbers for the computations, and we use R = 2^20 because that way the a * b numbers that occur during all computations stay in the required range (the inverse NTT computation is where they can grow the most), so we don't have to do Barrett reductions during that computation. For the intrinsics, we use R = 2^16, because this way we can do twice as much work in parallel, but we have to do Barrett reduction after levels 2 and 4 in the inverse NTT computation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2097757145 From dnsimon at openjdk.org Tue May 20 12:14:07 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 20 May 2025 12:14:07 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v2] In-Reply-To: References: Message-ID: > As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: > > > Error occurred during initialization of VM > java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) > > > This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. > Instead of exiting the VM, the failure should be silent (unless `-XX:+PrintCompilation` is enabled) as the VM can continue without libgraal, albeit in a crippled state. This PR implements this solution. > > Alternative solutions include: > 1. Trying to adjust the values used with `ulimit -v` in the tests to accommodate the [virtual address reservations](https://github.com/oracle/graal/blob/69f10d3d658a6aeca3d5ce59c64af6a18336f14c/substratevm/src/com.oracle.svm.core.genscavenge/src/com/oracle/svm/core/genscavenge/AddressRangeCommittedMemoryProvider.java#L150) needed by libgraal. This is brittle as it assumes knowledge about how much address space is needed (which is turn depends on how many libgraal compiler threads are created). > 2. Add a `@requires !vm.libgraal.jit` guard to the tests so they are not run when libgraal is in use. > > I think the solution in this PR is the most robust for the long term. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: consolidate JVMCI eager initialization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25307/files - new: https://git.openjdk.org/jdk/pull/25307/files/7eb259b9..32986d1a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=00-01 Stats: 41 lines in 5 files changed: 17 ins; 19 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25307.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25307/head:pull/25307 PR: https://git.openjdk.org/jdk/pull/25307 From yzheng at openjdk.org Tue May 20 12:27:53 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 20 May 2025 12:27:53 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v2] In-Reply-To: References: Message-ID: <3aLK-TCHFl8-YyAX6Ppjm458pXwA5jGq6qssypzvTw0=.8ad6de3e-63f4-4c1b-bac4-01c84549a7d7@github.com> On Tue, 20 May 2025 12:14:07 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Instead of exiting the VM, the failure should be silent (unless `-XX:+PrintCompilation` is enabled) as the VM can continue without libgraal, albeit in a crippled state. This PR implements this solution. >> >> Alternative solutions include: >> 1. Trying to adjust the values used with `ulimit -v` in the tests to accommodate the [virtual address reservations](https://github.com/oracle/graal/blob/69f10d3d658a6aeca3d5ce59c64af6a18336f14c/substratevm/src/com.oracle.svm.core.genscavenge/src/com/oracle/svm/core/genscavenge/AddressRangeCommittedMemoryProvider.java#L150) needed by libgraal. This is brittle as it assumes knowledge about how much address space is needed (which is turn depends on how many libgraal compiler threads are created). >> 2. Add a `@requires !vm.libgraal.jit` guard to the tests so they are not run when libgraal is in use. >> >> I think the solution in this PR is the most robust for the long term. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > consolidate JVMCI eager initialization LGTM ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/25307#pullrequestreview-2853970394 From dnsimon at openjdk.org Tue May 20 12:58:28 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 20 May 2025 12:58:28 GMT Subject: RFR: 8357370: Export supported GCs in JVMCI In-Reply-To: References: Message-ID: On Tue, 20 May 2025 12:52:02 GMT, Roman Kennke wrote: > I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. > > Testing: > - [x] build/test https://github.com/oracle/graal/pull/10904 src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 498: > 496: declare_preprocessor_constant("ASSERT", DEBUG_ONLY(1) NOT_DEBUG(0)) \ > 497: \ > 498: declare_preprocessor_constant("INCLUDE_SERIALGC", INCLUDE_SERIALGC) \ Probably best to make the formatting consistent with how it's done for the `JVM_ACC_*` constants below (i.e., no alignment of values). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25325#discussion_r2097893655 From rkennke at openjdk.org Tue May 20 12:58:27 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 20 May 2025 12:58:27 GMT Subject: RFR: 8357370: Export supported GCs in JVMCI Message-ID: I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. Testing: - [x] build/test https://github.com/oracle/graal/pull/10904 ------------- Commit messages: - 8357370: Export supported GCs in JVMCI Changes: https://git.openjdk.org/jdk/pull/25325/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25325&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357370 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25325/head:pull/25325 PR: https://git.openjdk.org/jdk/pull/25325 From rkennke at openjdk.org Tue May 20 13:12:06 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 20 May 2025 13:12:06 GMT Subject: RFR: 8357370: Export supported GCs in JVMCI [v2] In-Reply-To: References: Message-ID: > I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. > > Testing: > - [x] build/test https://github.com/oracle/graal/pull/10904 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Don't align values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25325/files - new: https://git.openjdk.org/jdk/pull/25325/files/7caef245..321a0940 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25325&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25325&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25325/head:pull/25325 PR: https://git.openjdk.org/jdk/pull/25325 From rkennke at openjdk.org Tue May 20 13:35:31 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 20 May 2025 13:35:31 GMT Subject: RFR: 8357370: Export supported GCs in JVMCI [v3] In-Reply-To: References: Message-ID: > I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. > > Testing: > - [x] build/test https://github.com/oracle/graal/pull/10904 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Align most trailing \s ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25325/files - new: https://git.openjdk.org/jdk/pull/25325/files/321a0940..16d82e7b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25325&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25325&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25325/head:pull/25325 PR: https://git.openjdk.org/jdk/pull/25325 From dnsimon at openjdk.org Tue May 20 13:35:31 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 20 May 2025 13:35:31 GMT Subject: RFR: 8357370: Export supported GCs in JVMCI [v3] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 13:32:08 GMT, Roman Kennke wrote: >> I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. >> >> Testing: >> - [x] build/test https://github.com/oracle/graal/pull/10904 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Align most trailing \s LGTM and trivial. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25325#pullrequestreview-2854208149 From sviswanathan at openjdk.org Tue May 20 17:17:58 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 20 May 2025 17:17:58 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v6] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 11:51:49 GMT, Ferenc Rakoczi wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 250: >> >>> 248: static void montmul(int outputRegs[], int inputRegs1[], int inputRegs2[], >>> 249: int scratchRegs1[], int scratchRegs2[], MacroAssembler *_masm) { >>> 250: for (int i = 0; i < 4; i++) { >> >> In the intrinsic for montMul we are treating as if MONT_R_BITS is 16 and MONT_Q_INV_MOD_R is 0xF301 whereas in the Java code MONT_R_BITS is 20 and MONT_Q_INT_MOD_R is 0x8F301. Are these equivalent? > > As used in this case, they are equivalent. For z = montmul(a,b), z will be between -q and q and congruent to a * b * R^-1 mod q, where R > 2 * q, R is a power of 2, -R/2 * q <= a * b < R/2 * q. For the Java code, we use R = 2^20 and for the intrinsic, R = 2^16. In our computations, b is always c * R mod q, so the montmul() really computes a * c mod q. In the Java code, we use 32-bit numbers for the computations, and we use R = 2^20 because that way the a * b numbers that occur during all computations stay in the required range (the inverse NTT computation is where they can grow the most), so we don't have to do Barrett reductions during that computation. For the intrinsics, we use R = 2^16, because this way we can do twice as much work in parallel, but we have to do Barrett reduction after levels 2 and 4 in the inverse NTT computation. Thanks a lot for the explanation. It would be good to add it as a comment in the stubGenerator_x86_64_kyber.cpp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24953#discussion_r2098491524 From sviswanathan at openjdk.org Tue May 20 17:35:55 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 20 May 2025 17:35:55 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v6] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 13:33:42 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Response to review comment + loading constants with broadcast op. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24953#pullrequestreview-2855056310 From duke at openjdk.org Tue May 20 17:49:14 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 20 May 2025 17:49:14 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v7] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Added some comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24953/files - new: https://git.openjdk.org/jdk/pull/24953/files/e4f3264e..ea2152da Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24953&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24953&range=05-06 Stats: 14 lines in 1 file changed: 14 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24953/head:pull/24953 PR: https://git.openjdk.org/jdk/pull/24953 From sviswanathan at openjdk.org Tue May 20 17:52:55 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 20 May 2025 17:52:55 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v7] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 17:49:14 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Added some comments. Thanks for adding the comment. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24953#pullrequestreview-2855099857 From duke at openjdk.org Tue May 20 18:48:55 2025 From: duke at openjdk.org (duke) Date: Tue, 20 May 2025 18:48:55 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v7] In-Reply-To: References: Message-ID: <3ev08acOQdRUvWRfhksWfQER7TRnpd7gY5mA-OUb8_k=.5b3fe078-dbe9-41ec-b810-f7485280eba8@github.com> On Tue, 20 May 2025 17:49:14 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Added some comments. @ferakocz Your change (at version ea2152dab73080d2b4759526d220f19706d768b6) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24953#issuecomment-2895471350 From duke at openjdk.org Tue May 20 19:08:59 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 20 May 2025 19:08:59 GMT Subject: Integrated: 8351412: Add AVX-512 intrinsics for ML-KEM In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 18:49:52 GMT, Ferenc Rakoczi wrote: > By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. This pull request has now been integrated. Changeset: 972f2ebe Author: Ferenc Rakoczi Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/972f2ebe978280d22531a70116e79837632f6ebc Stats: 988 lines in 10 files changed: 977 ins; 2 del; 9 mod 8351412: Add AVX-512 intrinsics for ML-KEM Reviewed-by: sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/24953 From mullan at openjdk.org Tue May 20 19:13:58 2025 From: mullan at openjdk.org (Sean Mullan) Date: Tue, 20 May 2025 19:13:58 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v7] In-Reply-To: References: Message-ID: <0ZhaH_07oxLDZxz8wVEgbsbYWB50sjuLZxYwyM4ftno=.2adb899d-7768-481d-975b-8e0ee3e6f2c2@github.com> On Tue, 20 May 2025 17:49:14 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Added some comments. Please also write a release note as the performance improvement is significant. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24953#issuecomment-2895525488 From lmesnik at openjdk.org Tue May 20 23:53:59 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 20 May 2025 23:53:59 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v7] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 17:49:14 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Added some comments. I haven't find answer an my question about testing. How this fix is tested? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24953#issuecomment-2896080458 From duke at openjdk.org Wed May 21 04:43:59 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 21 May 2025 04:43:59 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v7] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 23:51:15 GMT, Leonid Mesnik wrote: > I haven't find answer an my question about testing. How this fix is tested? The change in the file test/jdk/sun/security/provider/acvp/Launcher.java in PR https://github.com/openjdk/jdk/pull/23860/files covers this as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24953#issuecomment-2896548094 From lmesnik at openjdk.org Wed May 21 05:02:58 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 21 May 2025 05:02:58 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v7] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 17:49:14 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Added some comments. Thanks for pointing to the test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24953#issuecomment-2896581694 From dnsimon at openjdk.org Wed May 21 08:56:59 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 08:56:59 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 06:14:09 GMT, Yudi Zheng wrote: >> This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > fix tests Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23159#pullrequestreview-2856879701 From yzheng at openjdk.org Wed May 21 08:56:59 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 21 May 2025 08:56:59 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs [v2] In-Reply-To: References: Message-ID: <8B0cGaejoT19Paf9ccpOje3O6DccoOuE2nm8G6o0gVY=.abd66730-d1c7-4819-9bae-ecdf78fa8e9b@github.com> On Tue, 20 May 2025 06:14:09 GMT, Yudi Zheng wrote: >> This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > fix tests thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23159#issuecomment-2897150185 From yzheng at openjdk.org Wed May 21 08:56:59 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 21 May 2025 08:56:59 GMT Subject: Integrated: 8334717: Add JVMCI support for APX EGPRs In-Reply-To: References: Message-ID: On Thu, 16 Jan 2025 16:01:32 GMT, Yudi Zheng wrote: > This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray This pull request has now been integrated. Changeset: 735c7899 Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/735c7899d124a4e0c9579ea7802c9475eaedda10 Stats: 561 lines in 21 files changed: 44 ins; 334 del; 183 mod 8334717: Add JVMCI support for APX EGPRs Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/23159 From rkennke at openjdk.org Wed May 21 11:14:59 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 21 May 2025 11:14:59 GMT Subject: RFR: 8357370: Export supported GCs in JVMCI [v3] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 13:35:31 GMT, Roman Kennke wrote: >> I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. >> >> Testing: >> - [x] build/test https://github.com/oracle/graal/pull/10904 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Align most trailing \s Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25325#issuecomment-2897553017 From rkennke at openjdk.org Wed May 21 11:15:00 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 21 May 2025 11:15:00 GMT Subject: Integrated: 8357370: Export supported GCs in JVMCI In-Reply-To: References: Message-ID: <3NC5H23jy_ZVGiP7FHXskbyotZZcJlEvU7O3idP0zk8=.b52e774d-fdf4-4ba9-86a6-dd158ffc9ead@github.com> On Tue, 20 May 2025 12:52:02 GMT, Roman Kennke wrote: > I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. > > Testing: > - [x] build/test https://github.com/oracle/graal/pull/10904 This pull request has now been integrated. Changeset: 2c126f19 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/2c126f1954435a5b4d6cdc367b7b5e8c91cfae63 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod 8357370: Export supported GCs in JVMCI Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/25325 From yzheng at openjdk.org Wed May 21 15:06:16 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 21 May 2025 15:06:16 GMT Subject: RFR: 8357424: [JVMCI] Avoid incrementing decompilation count for hosted compiled nmethod Message-ID: Hosted Truffle compilations are installed on the OptimizedCallTarget#profiledPERoot method. Any deoptimization contributes to its decompile count, which can easily exceed the PerMethodRecompilationCutoff threshold, permanently preventing highest tier compilation on this method. This PR exempts hosted compilations from this cutoff by ensuring their decompile count is not incremented for hosted compiled nmethods. ------------- Commit messages: - [JVMCI] Avoid incrementing decompilation count for hosted compiled nmethod Changes: https://git.openjdk.org/jdk/pull/25356/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25356&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357424 Stats: 45 lines in 4 files changed: 37 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25356.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25356/head:pull/25356 PR: https://git.openjdk.org/jdk/pull/25356 From yzheng at openjdk.org Wed May 21 15:10:30 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 21 May 2025 15:10:30 GMT Subject: RFR: 8357424: [JVMCI] Avoid incrementing decompilation count for hosted compiled nmethod [v2] In-Reply-To: References: Message-ID: > Hosted Truffle compilations are installed on the OptimizedCallTarget#profiledPERoot method. Any deoptimization contributes to its decompile count, which can easily exceed the PerMethodRecompilationCutoff threshold, permanently preventing highest tier compilation on this method. This PR exempts hosted compilations from this cutoff by ensuring their decompile count is not incremented for hosted compiled nmethods. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: update copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25356/files - new: https://git.openjdk.org/jdk/pull/25356/files/8fcd7104..ef4a4c98 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25356&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25356&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25356.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25356/head:pull/25356 PR: https://git.openjdk.org/jdk/pull/25356 From iklam at openjdk.org Wed May 21 15:18:00 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 21 May 2025 15:18:00 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v7] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 07:46:23 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `-XX:+EnableJVMCI` must be explicitly specified to the launcher (as opposed to being true as a result of [`-XX:+UseJVMCICompiler`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L88) or [`-XX:+EnableJVMCIProduct`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L64)). Alternatively, `--add-modules=jdk.internal.vm.ci` can be specified - it has the same semantics as `-XX:+EnableJVMCI`. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. >> >> Graal adaption PR: https://github.com/oracle/graal/pull/11212 > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > swapped order of recommended options in error message Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25240#pullrequestreview-2858145831 From never at openjdk.org Wed May 21 15:57:55 2025 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 21 May 2025 15:57:55 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 12:14:07 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Instead of exiting the VM, the failure should be silent (unless `-XX:+PrintCompilation` is enabled) as the VM can continue without libgraal, albeit in a crippled state. This PR implements this solution. >> >> Alternative solutions include: >> 1. Trying to adjust the values used with `ulimit -v` in the tests to accommodate the [virtual address reservations](https://github.com/oracle/graal/blob/69f10d3d658a6aeca3d5ce59c64af6a18336f14c/substratevm/src/com.oracle.svm.core.genscavenge/src/com/oracle/svm/core/genscavenge/AddressRangeCommittedMemoryProvider.java#L150) needed by libgraal. This is brittle as it assumes knowledge about how much address space is needed (which is turn depends on how many libgraal compiler threads are created). >> 2. Add a `@requires !vm.libgraal.jit` guard to the tests so they are not run when libgraal is in use. >> >> I think the solution in this PR is the most robust for the long term. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > consolidate JVMCI eager initialization Silently disabling the top level JIT seems like a bad default behaviour for customers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25307#issuecomment-2898465575 From dnsimon at openjdk.org Wed May 21 16:15:52 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 16:15:52 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v2] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 15:54:58 GMT, Tom Rodriguez wrote: > Silently disabling the top level JIT seems like a bad default behaviour for customers. This does not disable the JIT, just suppresses a specific type of error (i.e., reserving virtual address space for the SVM heap) when trying to initialize libgraal at startup. Importantly, the error of badly specified libgraal options still causes a VM exit. What alternative solution would you prefer? One of the other 2 proposals in the PR description? Or something else? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25307#issuecomment-2898519536 From kvn at openjdk.org Wed May 21 17:19:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 21 May 2025 17:19:57 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v7] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 07:46:23 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `-XX:+EnableJVMCI` must be explicitly specified to the launcher (as opposed to being true as a result of [`-XX:+UseJVMCICompiler`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L88) or [`-XX:+EnableJVMCIProduct`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L64)). Alternatively, `--add-modules=jdk.internal.vm.ci` can be specified - it has the same semantics as `-XX:+EnableJVMCI`. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. >> >> Graal adaption PR: https://github.com/oracle/graal/pull/11212 > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > swapped order of recommended options in error message Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25240#pullrequestreview-2858568094 From never at openjdk.org Wed May 21 17:55:55 2025 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 21 May 2025 17:55:55 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 12:14:07 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Instead of exiting the VM, the failure should be silent (unless `-XX:+PrintCompilation` is enabled) as the VM can continue without libgraal, albeit in a crippled state. This PR implements this solution. >> >> Alternative solutions include: >> 1. Trying to adjust the values used with `ulimit -v` in the tests to accommodate the [virtual address reservations](https://github.com/oracle/graal/blob/69f10d3d658a6aeca3d5ce59c64af6a18336f14c/substratevm/src/com.oracle.svm.core.genscavenge/src/com/oracle/svm/core/genscavenge/AddressRangeCommittedMemoryProvider.java#L150) needed by libgraal. This is brittle as it assumes knowledge about how much address space is needed (which is turn depends on how many libgraal compiler threads are created). >> 2. Add a `@requires !vm.libgraal.jit` guard to the tests so they are not run when libgraal is in use. >> >> I think the solution in this PR is the most robust for the long term. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > consolidate JVMCI eager initialization After this executes we have a running JVM without a working libgraal right? It might be rare in a user environment but it's very confusing behaviour for an end user. Might this not occur in a virtualized environment? I agree it would be very hard to make libgraal robust in the face of such a limited virtual address space so I think disabling the tests for libgraal would be easiest. Or both of those tests could probably just run with -Xint to avoid this completely. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25307#issuecomment-2898777546 From dnsimon at openjdk.org Wed May 21 19:24:00 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 19:24:00 GMT Subject: RFR: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used [v7] In-Reply-To: References: Message-ID: <5HZC3_I8BfmE7cq4-2CvEkUiwayB2nMX3uF7EXV2Csw=.e4230abe-b732-444c-b391-935aac6b7891@github.com> On Mon, 19 May 2025 07:46:23 GMT, Doug Simon wrote: >> The `EnableJVMCI` flag currently serves 2 purposes: >> * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). >> * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. >> >> This PR changes nothing about the first point. >> >> On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `-XX:+EnableJVMCI` must be explicitly specified to the launcher (as opposed to being true as a result of [`-XX:+UseJVMCICompiler`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L88) or [`-XX:+EnableJVMCIProduct`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L64)). Alternatively, `--add-modules=jdk.internal.vm.ci` can be specified - it has the same semantics as `-XX:+EnableJVMCI`. >> If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. >> >> The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. >> >> Graal adaption PR: https://github.com/oracle/graal/pull/11212 > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > swapped order of recommended options in error message Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25240#issuecomment-2898984593 From dnsimon at openjdk.org Wed May 21 19:24:01 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 19:24:01 GMT Subject: Integrated: 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used In-Reply-To: References: Message-ID: On Wed, 14 May 2025 22:00:30 GMT, Doug Simon wrote: > The `EnableJVMCI` flag currently serves 2 purposes: > * Guards VM code ([example](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/sharedRuntime.cpp#L652)). > * [Adds](https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/runtime/arguments.cpp#L1804) `jdk.internal.vm.ci` to the root module set. > > This PR changes nothing about the first point. > > On the second point, to use the `jdk.internal.vm.ci` module when libgraal is enabled, `-XX:+EnableJVMCI` must be explicitly specified to the launcher (as opposed to being true as a result of [`-XX:+UseJVMCICompiler`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L88) or [`-XX:+EnableJVMCIProduct`](https://github.com/openjdk/jdk/blob/76570c627db527f856f2394fb9ead02939eca621/src/hotspot/share/jvmci/jvmci_globals.cpp#L64)). Alternatively, `--add-modules=jdk.internal.vm.ci` can be specified - it has the same semantics as `-XX:+EnableJVMCI`. > If libgraal is not enabled, +EnableJVMCI will continue to add `jdk.internal.vm.ci` to the root module set. > > The primary motivation is to make use of libgraal compatible with `-XX:+AOTClassLinking`. This flag relies on the root module set archive created in a training run. If the root module set is different in the production run, the AOTClassLinking [optimizations](https://bugs.openjdk.org/browse/JDK-8342279) are disabled. As `jdk.internal.vm.ci` is not resolved in the training run, it must not be resolved in production run. As such, `-XX:+EnableJVMCI` must not cause resolution of `jdk.internal.vm.ci`, otherwise libgraal will not have the startup advantages of AOTClassLinking. > > Graal adaption PR: https://github.com/oracle/graal/pull/11212 This pull request has now been integrated. Changeset: 81536830 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/81536830ed096005c4f09ab446238ce50989cea9 Stats: 54 lines in 8 files changed: 31 ins; 15 del; 8 mod 8345826: Do not automatically resolve jdk.internal.vm.ci when libgraal is used Reviewed-by: iklam, never, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25240 From dnsimon at openjdk.org Wed May 21 20:41:35 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 20:41:35 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v3] In-Reply-To: References: Message-ID: > As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: > > > Error occurred during initialization of VM > java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) > > > This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. > Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. Doug Simon has updated the pull request incrementally with three additional commits since the last revision: - tests that use 'ulimit -v' should run with -Xint - Revert "do not exit VM if libjvmci env creation fails" This reverts commit 7eb259b92553669065db57d230476cf465a67d02. - Revert "consolidate JVMCI eager initialization" This reverts commit 32986d1a2b741ee8c9090cefbecc148bb8fbd7e4. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25307/files - new: https://git.openjdk.org/jdk/pull/25307/files/32986d1a..1a79617e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=01-02 Stats: 55 lines in 9 files changed: 30 ins; 18 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25307.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25307/head:pull/25307 PR: https://git.openjdk.org/jdk/pull/25307 From dnsimon at openjdk.org Wed May 21 20:41:35 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 20:41:35 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v2] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 17:53:13 GMT, Tom Rodriguez wrote: > Or both of those tests could probably just run with -Xint to avoid this completely. I've reverted to this solution - thanks for the suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25307#issuecomment-2899176436 From dnsimon at openjdk.org Wed May 21 20:46:04 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 20:46:04 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v4] In-Reply-To: References: Message-ID: > As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: > > > Error occurred during initialization of VM > java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) > > > This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. > Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: added comments justifying use of -Xint ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25307/files - new: https://git.openjdk.org/jdk/pull/25307/files/1a79617e..b0d45b1b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=02-03 Stats: 7 lines in 2 files changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25307.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25307/head:pull/25307 PR: https://git.openjdk.org/jdk/pull/25307 From dnsimon at openjdk.org Wed May 21 20:46:05 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 20:46:05 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v3] In-Reply-To: References: Message-ID: <4TOJwaT4xDVYnzB1co2JKSILNBV5lwBUduMZHRtquSU=.754489ed-035f-427b-8903-f5edcd0309cd@github.com> On Wed, 21 May 2025 20:41:35 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. > > Doug Simon has updated the pull request incrementally with three additional commits since the last revision: > > - tests that use 'ulimit -v' should run with -Xint > - Revert "do not exit VM if libjvmci env creation fails" > > This reverts commit 7eb259b92553669065db57d230476cf465a67d02. > - Revert "consolidate JVMCI eager initialization" > > This reverts commit 32986d1a2b741ee8c9090cefbecc148bb8fbd7e4. Tested locally with a build that includes libgraal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25307#issuecomment-2899184608 From dnsimon at openjdk.org Wed May 21 20:59:33 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 20:59:33 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v5] In-Reply-To: References: Message-ID: > As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: > > > Error occurred during initialization of VM > java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) > > > This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. > Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: removed trailing space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25307/files - new: https://git.openjdk.org/jdk/pull/25307/files/b0d45b1b..3201a5d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25307.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25307/head:pull/25307 PR: https://git.openjdk.org/jdk/pull/25307 From dnsimon at openjdk.org Wed May 21 21:07:23 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 21:07:23 GMT Subject: RFR: 8357506: [JVMCI] Consolidate eager JVMCI initialization code Message-ID: <4rwSBU4yySv789oCHmEoFgVsOqj7ZJ65owldXBviF-s=.9b15ee7e-32a0-4a54-b39c-bf1a440b1dd1@github.com> While working on [JDK-8357135](https://bugs.openjdk.org/browse/JDK-8357135), I was reminded that some of the code implementing eager JVMCI compiler initialization (i.e. `-XX:+EagerJVMCI`) is in helper methods such as `JVMCIRuntime::call_getCompiler` that sound general purpose but are only used for eager JVMCI compiler initialization. This PR inlines `JVMCIRuntime::call_getCompiler` and renames `JVMCI::initialize_compiler` to `initialize_compiler_in_create_vm` to make its single use case clearer. ------------- Commit messages: - consolidate JVMCI eager initialization Changes: https://git.openjdk.org/jdk/pull/25369/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25369&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357506 Stats: 25 lines in 6 files changed: 5 ins; 14 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25369.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25369/head:pull/25369 PR: https://git.openjdk.org/jdk/pull/25369 From dnsimon at openjdk.org Wed May 21 21:15:52 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 21:15:52 GMT Subject: RFR: 8357424: [JVMCI] Avoid incrementing decompilation count for hosted compiled nmethod [v2] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 15:10:30 GMT, Yudi Zheng wrote: >> Hosted Truffle compilations are installed on the OptimizedCallTarget#profiledPERoot method. Any deoptimization contributes to its decompile count, which can easily exceed the PerMethodRecompilationCutoff threshold, permanently preventing highest tier compilation on this method. This PR exempts hosted compilations from this cutoff by ensuring their decompile count is not incremented for hosted compiled nmethods. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > update copyright src/hotspot/share/jvmci/jvmciRuntime.hpp line 49: > 47: friend class JVMCIVMStructs; > 48: > 49: // Is HotSpotNmethod.name non-null? If so, the value is This comment needs to be moved inside the bitfield struct above `_has_name`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25356#discussion_r2101179755 From dnsimon at openjdk.org Wed May 21 21:22:56 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 21:22:56 GMT Subject: RFR: 8357424: [JVMCI] Avoid incrementing decompilation count for hosted compiled nmethod [v2] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 15:10:30 GMT, Yudi Zheng wrote: >> Hosted Truffle compilations are installed on the OptimizedCallTarget#profiledPERoot method. Any deoptimization contributes to its decompile count, which can easily exceed the PerMethodRecompilationCutoff threshold, permanently preventing highest tier compilation on this method. This PR exempts hosted compilations from this cutoff by ensuring their decompile count is not incremented for hosted compiled nmethods. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > update copyright src/hotspot/share/code/nmethod.cpp line 2061: > 2059: #if INCLUDE_JVMCI > 2060: if (jvmci_nmethod_data() != nullptr && !jvmci_nmethod_data()->is_default()) { > 2061: // Hosted compilations are not subject to the recompilation cutoff Suggestion: // Non-default (i.e., non-CompileBroker) compilations are not subject to the recompilation cutoff "hosted compilations" can be confusing (even though I see we unfortunately already use it elsewhere in JVMCI) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25356#discussion_r2101189921 From yzheng at openjdk.org Thu May 22 07:50:05 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 22 May 2025 07:50:05 GMT Subject: RFR: 8357424: [JVMCI] Avoid incrementing decompilation count for hosted compiled nmethod [v3] In-Reply-To: References: Message-ID: <1ZsklUTLqTRFvGHmsHFj45Mh_rvdmoxecbJ-hpkFqms=.0ff8dc74-2fd8-4c6f-97fb-e24b33cbd6d0@github.com> > Hosted Truffle compilations are installed on the OptimizedCallTarget#profiledPERoot method. Any deoptimization contributes to its decompile count, which can easily exceed the PerMethodRecompilationCutoff threshold, permanently preventing highest tier compilation on this method. This PR exempts hosted compilations from this cutoff by ensuring their decompile count is not incremented for hosted compiled nmethods. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25356/files - new: https://git.openjdk.org/jdk/pull/25356/files/ef4a4c98..e66c16d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25356&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25356&range=01-02 Stats: 7 lines in 2 files changed: 4 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25356.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25356/head:pull/25356 PR: https://git.openjdk.org/jdk/pull/25356 From yzheng at openjdk.org Thu May 22 08:04:35 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 22 May 2025 08:04:35 GMT Subject: RFR: 8357424: [JVMCI] Avoid incrementing decompilation count for hosted compiled nmethod [v4] In-Reply-To: References: Message-ID: > Hosted Truffle compilations are installed on the OptimizedCallTarget#profiledPERoot method. Any deoptimization contributes to its decompile count, which can easily exceed the PerMethodRecompilationCutoff threshold, permanently preventing highest tier compilation on this method. This PR exempts hosted compilations from this cutoff by ensuring their decompile count is not incremented for hosted compiled nmethods. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25356/files - new: https://git.openjdk.org/jdk/pull/25356/files/e66c16d1..b72213ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25356&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25356&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25356.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25356/head:pull/25356 PR: https://git.openjdk.org/jdk/pull/25356 From dnsimon at openjdk.org Thu May 22 08:35:51 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 22 May 2025 08:35:51 GMT Subject: RFR: 8357424: [JVMCI] Avoid incrementing decompilation count for hosted compiled nmethod [v4] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 08:04:35 GMT, Yudi Zheng wrote: >> Hosted Truffle compilations are installed on the OptimizedCallTarget#profiledPERoot method. Any deoptimization contributes to its decompile count, which can easily exceed the PerMethodRecompilationCutoff threshold, permanently preventing highest tier compilation on this method. This PR exempts hosted compilations from this cutoff by ensuring their decompile count is not incremented for hosted compiled nmethods. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comments Looks good. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25356#pullrequestreview-2860294240 From never at openjdk.org Thu May 22 15:34:00 2025 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 22 May 2025 15:34:00 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v5] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 20:59:33 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > removed trailing space This seems reasonable to me. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25307#pullrequestreview-2861697506 From dnsimon at openjdk.org Thu May 22 17:03:59 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 22 May 2025 17:03:59 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v5] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 20:59:33 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > removed trailing space Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25307#issuecomment-2901962121 From dnsimon at openjdk.org Thu May 22 17:04:00 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 22 May 2025 17:04:00 GMT Subject: Integrated: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 In-Reply-To: References: Message-ID: On Mon, 19 May 2025 17:50:21 GMT, Doug Simon wrote: > As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: > > > Error occurred during initialization of VM > java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) > > > This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. > Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. This pull request has now been integrated. Changeset: 1258af42 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/1258af42bec92a2797897cb6126b60b582a29d76 Stats: 7 lines in 2 files changed: 7 ins; 0 del; 0 mod 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 Reviewed-by: never, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/25307 From dnsimon at openjdk.org Thu May 22 17:24:08 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 22 May 2025 17:24:08 GMT Subject: RFR: 8357506: [JVMCI] Consolidate eager JVMCI initialization code [v2] In-Reply-To: <4rwSBU4yySv789oCHmEoFgVsOqj7ZJ65owldXBviF-s=.9b15ee7e-32a0-4a54-b39c-bf1a440b1dd1@github.com> References: <4rwSBU4yySv789oCHmEoFgVsOqj7ZJ65owldXBviF-s=.9b15ee7e-32a0-4a54-b39c-bf1a440b1dd1@github.com> Message-ID: > While working on [JDK-8357135](https://bugs.openjdk.org/browse/JDK-8357135), I was reminded that some of the code implementing eager JVMCI compiler initialization (i.e. `-XX:+EagerJVMCI`) is in helper methods such as `JVMCIRuntime::call_getCompiler` that sound general purpose but are only used for eager JVMCI compiler initialization. This PR inlines `JVMCIRuntime::call_getCompiler` and renames `JVMCI::initialize_compiler` to `initialize_compiler_in_create_vm` to make its single use case clearer. Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: consolidate JVMCI eager initialization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25369/files - new: https://git.openjdk.org/jdk/pull/25369/files/c069487d..3b4bf20e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25369&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25369&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25369.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25369/head:pull/25369 PR: https://git.openjdk.org/jdk/pull/25369 From dnsimon at openjdk.org Thu May 22 18:03:31 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 22 May 2025 18:03:31 GMT Subject: RFR: 8357581: [JVMCI] Add ProfilingInfo.getDecompileCount Message-ID: Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. ------------- Commit messages: - added HotSpotProfilingInfo Changes: https://git.openjdk.org/jdk/pull/25397/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25397&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357581 Stats: 235 lines in 5 files changed: 17 ins; 194 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/25397.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25397/head:pull/25397 PR: https://git.openjdk.org/jdk/pull/25397 From never at openjdk.org Thu May 22 18:32:53 2025 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 22 May 2025 18:32:53 GMT Subject: RFR: 8357581: [JVMCI] Add ProfilingInfo.getDecompileCount In-Reply-To: References: Message-ID: On Thu, 22 May 2025 17:12:34 GMT, Doug Simon wrote: > Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). > This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. > The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. Looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25397#pullrequestreview-2862216006 From kvn at openjdk.org Thu May 22 22:01:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 May 2025 22:01:57 GMT Subject: RFR: 8357581: [JVMCI] Add HotSpotProfilingInfo In-Reply-To: References: Message-ID: On Thu, 22 May 2025 17:12:34 GMT, Doug Simon wrote: > Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). > This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. > The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. Just one cosmetic comment about copyright year. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotProfilingInfo.java line 2: > 1: /* > 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. Please, keep 2 years: 2012, 2025. Even if you changed content the file is still present. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotProfilingInfoImpl.java line 2: > 1: /* > 2: * Copyright (c) 2012, 2025, Oracle and/or its affiliates. All rights reserved. this is one is fine since you copied it from an other file. ------------- PR Review: https://git.openjdk.org/jdk/pull/25397#pullrequestreview-2862651511 PR Review Comment: https://git.openjdk.org/jdk/pull/25397#discussion_r2103452147 PR Review Comment: https://git.openjdk.org/jdk/pull/25397#discussion_r2103452864 From kvn at openjdk.org Thu May 22 22:05:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 May 2025 22:05:01 GMT Subject: RFR: 8357506: [JVMCI] Consolidate eager JVMCI initialization code [v2] In-Reply-To: References: <4rwSBU4yySv789oCHmEoFgVsOqj7ZJ65owldXBviF-s=.9b15ee7e-32a0-4a54-b39c-bf1a440b1dd1@github.com> Message-ID: On Thu, 22 May 2025 17:24:08 GMT, Doug Simon wrote: >> While working on [JDK-8357135](https://bugs.openjdk.org/browse/JDK-8357135), I was reminded that some of the code implementing eager JVMCI compiler initialization (i.e. `-XX:+EagerJVMCI`) is in helper methods such as `JVMCIRuntime::call_getCompiler` that sound general purpose but are only used for eager JVMCI compiler initialization. This PR inlines `JVMCIRuntime::call_getCompiler` and renames `JVMCI::initialize_compiler` to `initialize_compiler_in_create_vm` to make its single use case clearer. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > consolidate JVMCI eager initialization Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25369#pullrequestreview-2862661991 From cslucas at openjdk.org Thu May 22 22:38:12 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 22 May 2025 22:38:12 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" Message-ID: Please review this refactor to transform the reasons for making an nmethod not entrant from `const char*` into enum values. Tested on Linux x64 with JTREG tier1-3 in fastdebug and release mode. ------------- Commit messages: - Refactor nmethod make_not_entrant reason Changes: https://git.openjdk.org/jdk/pull/25338/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25338&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357396 Stats: 60 lines in 15 files changed: 26 ins; 4 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/25338.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25338/head:pull/25338 PR: https://git.openjdk.org/jdk/pull/25338 From dnsimon at openjdk.org Fri May 23 06:19:42 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 23 May 2025 06:19:42 GMT Subject: RFR: 8357581: [JVMCI] Add HotSpotProfilingInfo [v2] In-Reply-To: References: Message-ID: > Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). > This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. > The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: fix copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25397/files - new: https://git.openjdk.org/jdk/pull/25397/files/d95475b0..12a9a059 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25397&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25397&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25397.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25397/head:pull/25397 PR: https://git.openjdk.org/jdk/pull/25397 From dnsimon at openjdk.org Fri May 23 06:19:42 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 23 May 2025 06:19:42 GMT Subject: RFR: 8357581: [JVMCI] Add HotSpotProfilingInfo [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 21:56:06 GMT, Vladimir Kozlov wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> fix copyright > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotProfilingInfo.java line 2: > >> 1: /* >> 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. > > Please, keep 2 years: 2012, 2025. Even if you changed content the file is still present. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25397#discussion_r2103880075 From yzheng at openjdk.org Fri May 23 06:35:56 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 23 May 2025 06:35:56 GMT Subject: RFR: 8357506: [JVMCI] Consolidate eager JVMCI initialization code [v2] In-Reply-To: References: <4rwSBU4yySv789oCHmEoFgVsOqj7ZJ65owldXBviF-s=.9b15ee7e-32a0-4a54-b39c-bf1a440b1dd1@github.com> Message-ID: On Thu, 22 May 2025 17:24:08 GMT, Doug Simon wrote: >> While working on [JDK-8357135](https://bugs.openjdk.org/browse/JDK-8357135), I was reminded that some of the code implementing eager JVMCI compiler initialization (i.e. `-XX:+EagerJVMCI`) is in helper methods such as `JVMCIRuntime::call_getCompiler` that sound general purpose but are only used for eager JVMCI compiler initialization. This PR inlines `JVMCIRuntime::call_getCompiler` and renames `JVMCI::initialize_compiler` to `initialize_compiler_in_create_vm` to make its single use case clearer. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > consolidate JVMCI eager initialization LGTM ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/25369#pullrequestreview-2863339517 From dnsimon at openjdk.org Fri May 23 06:35:57 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 23 May 2025 06:35:57 GMT Subject: RFR: 8357506: [JVMCI] Consolidate eager JVMCI initialization code [v2] In-Reply-To: References: <4rwSBU4yySv789oCHmEoFgVsOqj7ZJ65owldXBviF-s=.9b15ee7e-32a0-4a54-b39c-bf1a440b1dd1@github.com> Message-ID: On Thu, 22 May 2025 17:24:08 GMT, Doug Simon wrote: >> While working on [JDK-8357135](https://bugs.openjdk.org/browse/JDK-8357135), I was reminded that some of the code implementing eager JVMCI compiler initialization (i.e. `-XX:+EagerJVMCI`) is in helper methods such as `JVMCIRuntime::call_getCompiler` that sound general purpose but are only used for eager JVMCI compiler initialization. This PR inlines `JVMCIRuntime::call_getCompiler` and renames `JVMCI::initialize_compiler` to `initialize_compiler_in_create_vm` to make its single use case clearer. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > consolidate JVMCI eager initialization Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25369#issuecomment-2903410300 From dnsimon at openjdk.org Fri May 23 06:35:57 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 23 May 2025 06:35:57 GMT Subject: Integrated: 8357506: [JVMCI] Consolidate eager JVMCI initialization code In-Reply-To: <4rwSBU4yySv789oCHmEoFgVsOqj7ZJ65owldXBviF-s=.9b15ee7e-32a0-4a54-b39c-bf1a440b1dd1@github.com> References: <4rwSBU4yySv789oCHmEoFgVsOqj7ZJ65owldXBviF-s=.9b15ee7e-32a0-4a54-b39c-bf1a440b1dd1@github.com> Message-ID: On Wed, 21 May 2025 20:58:23 GMT, Doug Simon wrote: > While working on [JDK-8357135](https://bugs.openjdk.org/browse/JDK-8357135), I was reminded that some of the code implementing eager JVMCI compiler initialization (i.e. `-XX:+EagerJVMCI`) is in helper methods such as `JVMCIRuntime::call_getCompiler` that sound general purpose but are only used for eager JVMCI compiler initialization. This PR inlines `JVMCIRuntime::call_getCompiler` and renames `JVMCI::initialize_compiler` to `initialize_compiler_in_create_vm` to make its single use case clearer. This pull request has now been integrated. Changeset: d6e4c5f6 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/d6e4c5f65932114b5c6f455db6cfaa220607ce18 Stats: 25 lines in 6 files changed: 5 ins; 14 del; 6 mod 8357506: [JVMCI] Consolidate eager JVMCI initialization code Reviewed-by: kvn, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/25369 From mhaessig at openjdk.org Fri May 23 07:43:51 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 23 May 2025 07:43:51 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" In-Reply-To: References: Message-ID: <2INMtrpMVMQg0FbhXDN51Snx7cg9jweK4ym464FMHes=.e6362690-ed9c-4395-ae75-a8e754a20fc6@github.com> On Tue, 20 May 2025 22:08:18 GMT, Cesar Soares Lucas wrote: > Please review this refactor to transform the reasons for making an nmethod not entrant from `const char*` into enum values. > > Tested on Linux x64 with JTREG tier1-3 in fastdebug and release mode. Thank you for working on this. I agree that an enum is the better option here. However, a scoped enum might be more appropriate here. For one, because that is the guidance in the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#enum). Secondly, I would argue that this enum should have some `as_string`-like function so the reason is still printed in plain text insteatd of an int. That spares me going into the source and counting down the enum, when I'm debugging a deopt ? . Once the codes does no rely on implicint conversion to print an int, we do not care about the underlying type. ------------- PR Review: https://git.openjdk.org/jdk/pull/25338#pullrequestreview-2863515112 From mchevalier at openjdk.org Fri May 23 07:53:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 23 May 2025 07:53:52 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" In-Reply-To: References: Message-ID: On Tue, 20 May 2025 22:08:18 GMT, Cesar Soares Lucas wrote: > Please review this refactor to transform the reasons for making an nmethod not entrant from `const char*` into enum values. > > Tested on Linux x64 with JTREG tier1-3 in fastdebug and release mode. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 1388: > 1386: C2V_VMENTRY(void, invalidateHotSpotNmethod, (JNIEnv* env, jobject, jobject hs_nmethod, jboolean deoptimize)) > 1387: JVMCIObject nmethod_mirror = JVMCIENV->wrap(hs_nmethod); > 1388: JVMCIENV->invalidate_nmethod_mirror(nmethod_mirror, deoptimize, nmethod::NMethodChangeReason::JVMCI_invalidate_nmethod, JVMCI_CHECK); Agree with @mhaessig's comment. If further discussion turns out to prefer the current `enum` over an `enum class`, then this `NMethodChangeReason::` is useless. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25338#discussion_r2104027069 From shade at openjdk.org Fri May 23 09:01:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 23 May 2025 09:01:56 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" In-Reply-To: References: Message-ID: On Tue, 20 May 2025 22:08:18 GMT, Cesar Soares Lucas wrote: > Please review this refactor to transform the reasons for making an nmethod not entrant from `const char*` into enum values. > > Tested on Linux x64 with JTREG tier1-3 in fastdebug and release mode. I added this argument to `make_not_entrant` recently in [JDK-8351640](https://bugs.openjdk.org/browse/JDK-8351640) -- mostly to print it in `PrintCompilation` logs. Putting enum might be fine, but it _has to_ maintain the same level of human readability. Do not just print `made not entrant: 42`. ------------- Changes requested by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25338#pullrequestreview-2863731885 From kvn at openjdk.org Fri May 23 15:57:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 May 2025 15:57:53 GMT Subject: RFR: 8357581: [JVMCI] Add HotSpotProfilingInfo [v2] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 06:19:42 GMT, Doug Simon wrote: >> Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). >> This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. >> The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25397#pullrequestreview-2864916858 From dnsimon at openjdk.org Fri May 23 16:33:04 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 23 May 2025 16:33:04 GMT Subject: RFR: 8357581: [JVMCI] Add HotSpotProfilingInfo [v2] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 06:19:42 GMT, Doug Simon wrote: >> Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). >> This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. >> The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25397#issuecomment-2905036890 From dnsimon at openjdk.org Fri May 23 16:33:05 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 23 May 2025 16:33:05 GMT Subject: Integrated: 8357581: [JVMCI] Add HotSpotProfilingInfo In-Reply-To: References: Message-ID: On Thu, 22 May 2025 17:12:34 GMT, Doug Simon wrote: > Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). > This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. > The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. This pull request has now been integrated. Changeset: 2b6b7661 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/2b6b7661b949971fe776714795d7dd46ed343cde Stats: 235 lines in 5 files changed: 17 ins; 194 del; 24 mod 8357581: [JVMCI] Add HotSpotProfilingInfo Reviewed-by: kvn, never ------------- PR: https://git.openjdk.org/jdk/pull/25397 From cslucas at openjdk.org Fri May 23 18:24:52 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 23 May 2025 18:24:52 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" In-Reply-To: References: Message-ID: On Tue, 20 May 2025 22:08:18 GMT, Cesar Soares Lucas wrote: > Please review this refactor to transform the reasons for making an nmethod not entrant from `const char*` into enum values. > > Tested on Linux x64 with JTREG tier1-3 in fastdebug and release mode. Thank you for the comments. I'll make the refactoring. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25338#issuecomment-2905409542 From duke at openjdk.org Fri May 23 20:54:43 2025 From: duke at openjdk.org (Zihao Lin) Date: Fri, 23 May 2025 20:54:43 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v7] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - Fix build - Fix test failed - 8344116: C2: remove slice parameter from LoadNode::make ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/3efb1c17..ea83736e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=05-06 Stats: 393670 lines in 4531 files changed: 146248 ins; 225477 del; 21945 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From duke at openjdk.org Mon May 26 13:05:01 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 26 May 2025 13:05:01 GMT Subject: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v7] In-Reply-To: <0ZhaH_07oxLDZxz8wVEgbsbYWB50sjuLZxYwyM4ftno=.2adb899d-7768-481d-975b-8e0ee3e6f2c2@github.com> References: <0ZhaH_07oxLDZxz8wVEgbsbYWB50sjuLZxYwyM4ftno=.2adb899d-7768-481d-975b-8e0ee3e6f2c2@github.com> Message-ID: <4Uc0-fOqIFIS5GFYXPTC6xp0WtcKrj9XNn_OEkl1N_I=.0ad95f85-4674-4ca3-a602-a965b97b699c@github.com> On Tue, 20 May 2025 19:10:45 GMT, Sean Mullan wrote: > Please also write a release note as the performance improvement is significant. Thanks! Done. https://bugs.openjdk.org/browse/JDK-8357741 Release Note: ML-KEM Performance Improved ------------- PR Comment: https://git.openjdk.org/jdk/pull/24953#issuecomment-2909684771 From never at openjdk.org Tue May 27 17:23:52 2025 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 27 May 2025 17:23:52 GMT Subject: RFR: 8357424: [JVMCI] Avoid incrementing decompilation count for hosted compiled nmethod [v4] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 08:04:35 GMT, Yudi Zheng wrote: >> Hosted Truffle compilations are installed on the OptimizedCallTarget#profiledPERoot method. Any deoptimization contributes to its decompile count, which can easily exceed the PerMethodRecompilationCutoff threshold, permanently preventing highest tier compilation on this method. This PR exempts hosted compilations from this cutoff by ensuring their decompile count is not incremented for hosted compiled nmethods. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comments src/hotspot/share/code/nmethod.cpp line 2059: > 2057: > 2058: if (update_recompile_counts()) { > 2059: #if INCLUDE_JVMCI I think this logic should be in nmethod::inc_decompile_method itself. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25356#discussion_r2109762040 From never at openjdk.org Tue May 27 17:32:57 2025 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 27 May 2025 17:32:57 GMT Subject: RFR: 8357424: [JVMCI] Avoid incrementing decompilation count for hosted compiled nmethod [v4] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 08:04:35 GMT, Yudi Zheng wrote: >> Hosted Truffle compilations are installed on the OptimizedCallTarget#profiledPERoot method. Any deoptimization contributes to its decompile count, which can easily exceed the PerMethodRecompilationCutoff threshold, permanently preventing highest tier compilation on this method. This PR exempts hosted compilations from this cutoff by ensuring their decompile count is not incremented for hosted compiled nmethods. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comments I think there are two levels of counters that we might want to disable. We definitely want to stop deopts and recompilations from marking the method not compilable which the current change does. Additionally JVMCIRuntime::register_method will perform this logic if validate_compile_task_dependencies fails and I don't think we want that. I think the new `!is_default` guard idiom should be in a helper like `nmethod::is_jvmci_hosted`. Do we use the hosted language elsewhere? The second level is to stop all counter updates in hosted compiles, for similar reasons. Those updates won't lead to disabling compilation but they will quickly lead to saturating of all the counters which is fairly pointless but probably benign. This would be done by setting `update_trap_state` to false for hosted nmethods. That also has the effect of keeping `inc_recompile_count` false. I think that's the right thing to do but I'd want to make sure that we test truffle workloads with those changes before making that change to make sure there isn't some subtle problem with that change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25356#issuecomment-2913383620 From sparasa at openjdk.org Tue May 27 19:59:54 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 27 May 2025 19:59:54 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v3] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Tue, 6 May 2025 21:45:34 GMT, Mohamed Issa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. >> >> For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. >> >> | Input range(s) | Throughput with baseline (op/s) | Throughput with intrinsic (op/s) | Speedup | >> | :-------------------------------------: | :----------------------------------: | :----------------------------------: | :---------: | >> | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | >> >> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Add new set of cbrt micro-benchmarks This PR looks good to me. I independently ran the correctness tests and performance benchmarks. Thanks, Vamsi ------------- Marked as reviewed by sparasa (Author). PR Review: https://git.openjdk.org/jdk/pull/24470#pullrequestreview-2872425795 From sviswanathan at openjdk.org Tue May 27 23:43:53 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 27 May 2025 23:43:53 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v3] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: <62SQLH6KGe8_w0LxmPSPW7C34v9-KFYliTk0RzMgJTs=.80886efd-7f4f-408c-8bbd-07af12ab9701@github.com> On Tue, 6 May 2025 21:45:34 GMT, Mohamed Issa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> The command to run all range specific micro-benchmarks is posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. >> >> For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. >> >> | Input range(s) | Throughput with baseline (op/s) | Throughput with intrinsic (op/s) | Speedup | >> | :-------------------------------------: | :----------------------------------: | :----------------------------------: | :---------: | >> | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | >> >> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Add new set of cbrt micro-benchmarks src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1251: > 1249: void movapd(XMMRegister dst, Address src) { Assembler::movapd(dst, src); } > 1250: void movapd(XMMRegister dst, AddressLiteral src, Register rscratch = noreg); > 1251: You could write it as: using Assembler::movapd; void movapd(XMMRegister dst, AddressLiteral src, Register rscratch = noreg); src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1323: > 1321: void unpckhpd(XMMRegister dst, XMMRegister src) { Assembler::unpckhpd(dst, src); } > 1322: void unpcklpd(XMMRegister dst, XMMRegister src) { Assembler::unpcklpd(dst, src); } > 1323: Do we need these declarations here? src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 43: > 41: // > 42: // Special cases: > 43: // cbrt(NaN) = quiet NaN, and raise invalid exception No exception is raised so the comment needs to be corrected. src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 226: > 224: __ andl(rcx, 248); > 225: __ lea(r8, ExternalAddress(rcp_table)); > 226: __ movsd(xmm4, Address(r8, rcx, Address::times_1)); This address and other instructions using similar address could be written as Address(rcx, r8, Address::times_1). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2110406675 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2110426188 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2110536680 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2110535561 From duke at openjdk.org Wed May 28 18:39:13 2025 From: duke at openjdk.org (Mohamed Issa) Date: Wed, 28 May 2025 18:39:13 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v4] In-Reply-To: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: <72GCipLKeCWCG-4jsG5XhZKkTdVsWafEq_wA0oD-0mk=.c70af814-54fe-43d2-b7c9-72b845eb99d5@github.com> > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. > > The command to run all range specific micro-benchmarks is posted below. > > `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. > > For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. > > | Input range(s) | Baseline throughput (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | > | :-------------------------------------: | :-------------------------------: | :-------------------------------: | :---------: | > | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | > | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | > > Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. Mohamed Issa has updated the pull request incrementally with four additional commits since the last revision: - Remove comment mentioning invalid exception when NaN input is provided - Use rcx as base and r8 as index for address calculations in certain cbrt stub generator instructions - Remove unnecessary unpckhpd and unpcklpd definitions in macro-assembler header file - Remove unnecessary movapd definitions in macro-assembler header file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24470/files - new: https://git.openjdk.org/jdk/pull/24470/files/57412f0d..ff4d4f22 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24470&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24470&range=02-03 Stats: 10 lines in 2 files changed: 0 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24470/head:pull/24470 PR: https://git.openjdk.org/jdk/pull/24470 From sviswanathan at openjdk.org Wed May 28 18:39:13 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 28 May 2025 18:39:13 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v4] In-Reply-To: <72GCipLKeCWCG-4jsG5XhZKkTdVsWafEq_wA0oD-0mk=.c70af814-54fe-43d2-b7c9-72b845eb99d5@github.com> References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> <72GCipLKeCWCG-4jsG5XhZKkTdVsWafEq_wA0oD-0mk=.c70af814-54fe-43d2-b7c9-72b845eb99d5@github.com> Message-ID: On Wed, 28 May 2025 18:36:38 GMT, Mohamed Issa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> The command to run all range specific micro-benchmarks is posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. >> >> For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. >> >> | Input range(s) | Baseline throughput (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | >> | :-------------------------------------: | :-------------------------------: | :-------------------------------: | :---------: | >> | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | >> >> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. > > Mohamed Issa has updated the pull request incrementally with four additional commits since the last revision: > > - Remove comment mentioning invalid exception when NaN input is provided > - Use rcx as base and r8 as index for address calculations in certain cbrt stub generator instructions > - Remove unnecessary unpckhpd and unpcklpd definitions in macro-assembler header file > - Remove unnecessary movapd definitions in macro-assembler header file Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24470#pullrequestreview-2876071455 From duke at openjdk.org Wed May 28 18:39:14 2025 From: duke at openjdk.org (Mohamed Issa) Date: Wed, 28 May 2025 18:39:14 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v3] In-Reply-To: <62SQLH6KGe8_w0LxmPSPW7C34v9-KFYliTk0RzMgJTs=.80886efd-7f4f-408c-8bbd-07af12ab9701@github.com> References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> <62SQLH6KGe8_w0LxmPSPW7C34v9-KFYliTk0RzMgJTs=.80886efd-7f4f-408c-8bbd-07af12ab9701@github.com> Message-ID: <8ArA3awbbtTvNZfaKvAdlv2oMcLP0cASBHr-VRK_-dc=.0fe37d63-7eb0-49d7-abf1-812e7be8dde8@github.com> On Tue, 27 May 2025 22:30:17 GMT, Sandhya Viswanathan wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Add new set of cbrt micro-benchmarks > > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1251: > >> 1249: void movapd(XMMRegister dst, Address src) { Assembler::movapd(dst, src); } >> 1250: void movapd(XMMRegister dst, AddressLiteral src, Register rscratch = noreg); >> 1251: > > You could write it as: > using Assembler::movapd; > void movapd(XMMRegister dst, AddressLiteral src, Register rscratch = noreg); Ok, this is updated now. > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1323: > >> 1321: void unpckhpd(XMMRegister dst, XMMRegister src) { Assembler::unpckhpd(dst, src); } >> 1322: void unpcklpd(XMMRegister dst, XMMRegister src) { Assembler::unpcklpd(dst, src); } >> 1323: > > Do we need these declarations here? No, they were superfluous as cbrt stub generator could already access them. I removed them now. > src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 43: > >> 41: // >> 42: // Special cases: >> 43: // cbrt(NaN) = quiet NaN, and raise invalid exception > > No exception is raised so the comment needs to be corrected. This is corrected now. > src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 226: > >> 224: __ andl(rcx, 248); >> 225: __ lea(r8, ExternalAddress(rcp_table)); >> 226: __ movsd(xmm4, Address(r8, rcx, Address::times_1)); > > This address and other instructions using similar address could be written as Address(rcx, r8, Address::times_1). Ok, I have changed the order, so rcx is viewed as base and r8 is viewed as index now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2112516974 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2112518997 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2112520793 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2112520486 From jbhateja at openjdk.org Thu May 29 08:38:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 29 May 2025 08:38:53 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v4] In-Reply-To: <72GCipLKeCWCG-4jsG5XhZKkTdVsWafEq_wA0oD-0mk=.c70af814-54fe-43d2-b7c9-72b845eb99d5@github.com> References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> <72GCipLKeCWCG-4jsG5XhZKkTdVsWafEq_wA0oD-0mk=.c70af814-54fe-43d2-b7c9-72b845eb99d5@github.com> Message-ID: On Wed, 28 May 2025 18:39:13 GMT, Mohamed Issa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> The command to run all range specific micro-benchmarks is posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. >> >> For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. >> >> | Input range(s) | Baseline throughput (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | >> | :-------------------------------------: | :-------------------------------: | :-------------------------------: | :---------: | >> | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | >> >> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. > > Mohamed Issa has updated the pull request incrementally with four additional commits since the last revision: > > - Remove comment mentioning invalid exception when NaN input is provided > - Use rcx as base and r8 as index for address calculations in certain cbrt stub generator instructions > - Remove unnecessary unpckhpd and unpcklpd definitions in macro-assembler header file > - Remove unnecessary movapd definitions in macro-assembler header file Patch looks good to me, some comment included. src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 185: > 183: > 184: #define __ _masm-> > 185: Original Intel libm inline sequence uses hexadecimal constants, I would have preferred to use them as it is to maintain 1:1 mapping b/w instruction sequence. test/micro/org/openjdk/bench/java/lang/CbrtPerf.java line 56: > 54: public static class CbrtPerfRanges { > 55: public static int cbrtInputCount = 2048; > 56: Please create separate CbrtPerfSpecialValues for +/- 0.0 and +/- Infinity and NaN values. I understand that handling special cases in intrinsic may impact general case performance but its ok to have atleast micro for it. test/micro/org/openjdk/bench/java/lang/CbrtPerf.java line 114: > 112: public static final double constDouble512 = 512.0; > 113: > 114: @Benchmark Baseline:- Benchmark (cbrtRangeIndex) Mode Cnt Score Error Units CbrtPerf.CbrtPerfConstant.cbrtConstDouble0 N/A thrpt 2 2673018.356 ops/ms CbrtPerf.CbrtPerfConstant.cbrtConstDouble1 N/A thrpt 2 2684233.593 ops/ms CbrtPerf.CbrtPerfConstant.cbrtConstDouble27 N/A thrpt 2 2684250.835 ops/ms CbrtPerf.CbrtPerfConstant.cbrtConstDouble512 N/A thrpt 2 2683616.321 ops/ms Withopt:- Benchmark (cbrtRangeIndex) Mode Cnt Score Error Units CbrtPerf.CbrtPerfConstant.cbrtConstDouble0 N/A thrpt 2 284575.292 ops/ms CbrtPerf.CbrtPerfConstant.cbrtConstDouble1 N/A thrpt 2 162876.035 ops/ms CbrtPerf.CbrtPerfConstant.cbrtConstDouble27 N/A thrpt 2 163227.835 ops/ms CbrtPerf.CbrtPerfConstant.cbrtConstDouble512 N/A thrpt 2 162998.844 ops/ms There is approximaely 10x performance improvement by disabling intrinsic for compile time constant inputs. I have created a follow up JBS to track it. https://bugs.openjdk.org/browse/JDK-8358039 ------------- PR Review: https://git.openjdk.org/jdk/pull/24470#pullrequestreview-2877492755 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2113462482 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2113484695 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2113472992 From jwaters at openjdk.org Thu May 29 09:03:55 2025 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 29 May 2025 09:03:55 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v4] In-Reply-To: <72GCipLKeCWCG-4jsG5XhZKkTdVsWafEq_wA0oD-0mk=.c70af814-54fe-43d2-b7c9-72b845eb99d5@github.com> References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> <72GCipLKeCWCG-4jsG5XhZKkTdVsWafEq_wA0oD-0mk=.c70af814-54fe-43d2-b7c9-72b845eb99d5@github.com> Message-ID: On Wed, 28 May 2025 18:39:13 GMT, Mohamed Issa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> The command to run all range specific micro-benchmarks is posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. >> >> For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. >> >> | Input range(s) | Baseline throughput (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | >> | :-------------------------------------: | :-------------------------------: | :-------------------------------: | :---------: | >> | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | >> >> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. > > Mohamed Issa has updated the pull request incrementally with four additional commits since the last revision: > > - Remove comment mentioning invalid exception when NaN input is provided > - Use rcx as base and r8 as index for address calculations in certain cbrt stub generator instructions > - Remove unnecessary unpckhpd and unpcklpd definitions in macro-assembler header file > - Remove unnecessary movapd definitions in macro-assembler header file src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 49: > 47: /******************************************************************************/ > 48: > 49: ATTRIBUTE_ALIGNED(4) static const juint _SIG_MASK[] = ATTRIBUTE_ALIGNED expands to alignas, I suggest using that directly instead src/hotspot/cpu/x86/templateInterpreterGenerator_x86_64.cpp line 503: > 501: > 502: return entry_point; > 503: } Is the newline removal intentional? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2113530767 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2113529587 From dnsimon at openjdk.org Thu May 29 13:04:40 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 29 May 2025 13:04:40 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror Message-ID: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. ------------- Commit messages: - remove phantom_ref arg from JVMCINMethodData::get_nmethod_mirror Changes: https://git.openjdk.org/jdk/pull/25488/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25488&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357619 Stats: 14 lines in 3 files changed: 1 ins; 5 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25488.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25488/head:pull/25488 PR: https://git.openjdk.org/jdk/pull/25488 From eosterlund at openjdk.org Thu May 29 13:04:41 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 29 May 2025 13:04:41 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror In-Reply-To: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> Message-ID: <5xZ0aIZT-xJM_h06TD061mZ_3T1qAPkd1F75vipRJ_w=.7d0e9d10-3a6c-4d1f-b457-aa0e1dd61560@github.com> On Wed, 28 May 2025 10:28:38 GMT, Doug Simon wrote: > The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. > This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2834: > 2832: // Only the mirror in the HotSpot heap is accessible > 2833: // through JVMCINMethodData > 2834: oop nmethod_mirror = data->get_nmethod_mirror(nm); Is the nmethod guaranteed to be on-stack here? If not it gotta be phantom. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25488#discussion_r2112390139 From dnsimon at openjdk.org Thu May 29 13:04:41 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 29 May 2025 13:04:41 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror In-Reply-To: <5xZ0aIZT-xJM_h06TD061mZ_3T1qAPkd1F75vipRJ_w=.7d0e9d10-3a6c-4d1f-b457-aa0e1dd61560@github.com> References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> <5xZ0aIZT-xJM_h06TD061mZ_3T1qAPkd1F75vipRJ_w=.7d0e9d10-3a6c-4d1f-b457-aa0e1dd61560@github.com> Message-ID: On Wed, 28 May 2025 17:15:48 GMT, Erik ?sterlund wrote: >> The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. >> This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. > > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2834: > >> 2832: // Only the mirror in the HotSpot heap is accessible >> 2833: // through JVMCINMethodData >> 2834: oop nmethod_mirror = data->get_nmethod_mirror(nm); > > Is the nmethod guaranteed to be on-stack here? If not it gotta be phantom. Is the use of `JVMCINMethodHandle` equivalent to `nm` being on-stack? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25488#discussion_r2112482026 From eosterlund at openjdk.org Thu May 29 13:04:41 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 29 May 2025 13:04:41 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror In-Reply-To: References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> <5xZ0aIZT-xJM_h06TD061mZ_3T1qAPkd1F75vipRJ_w=.7d0e9d10-3a6c-4d1f-b457-aa0e1dd61560@github.com> Message-ID: On Wed, 28 May 2025 18:10:39 GMT, Doug Simon wrote: >> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2834: >> >>> 2832: // Only the mirror in the HotSpot heap is accessible >>> 2833: // through JVMCINMethodData >>> 2834: oop nmethod_mirror = data->get_nmethod_mirror(nm); >> >> Is the nmethod guaranteed to be on-stack here? If not it gotta be phantom. > > Is the use of `JVMCINMethodHandle` equivalent to `nm` being on-stack? Yes. Great, so that should be fine then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25488#discussion_r2112673894 From duke at openjdk.org Thu May 29 18:56:11 2025 From: duke at openjdk.org (Mohamed Issa) Date: Thu, 29 May 2025 18:56:11 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v5] In-Reply-To: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. > > The command to run all range specific micro-benchmarks is posted below. > > `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. > > For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. > > | Input range(s) | Baseline throughput (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | > | :-------------------------------------: | :-------------------------------: | :-------------------------------: | :---------: | > | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | > | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | > > Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision: - Add newline back to templateInterpreterGenerator_x86_64.cpp source file - Add special case values to cbrt micro-benchmark set ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24470/files - new: https://git.openjdk.org/jdk/pull/24470/files/ff4d4f22..233e0188 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24470&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24470&range=03-04 Stats: 40 lines in 2 files changed: 39 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24470/head:pull/24470 PR: https://git.openjdk.org/jdk/pull/24470 From duke at openjdk.org Thu May 29 18:56:12 2025 From: duke at openjdk.org (Mohamed Issa) Date: Thu, 29 May 2025 18:56:12 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v4] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> <72GCipLKeCWCG-4jsG5XhZKkTdVsWafEq_wA0oD-0mk=.c70af814-54fe-43d2-b7c9-72b845eb99d5@github.com> Message-ID: On Thu, 29 May 2025 09:01:05 GMT, Julian Waters wrote: >> Mohamed Issa has updated the pull request incrementally with four additional commits since the last revision: >> >> - Remove comment mentioning invalid exception when NaN input is provided >> - Use rcx as base and r8 as index for address calculations in certain cbrt stub generator instructions >> - Remove unnecessary unpckhpd and unpcklpd definitions in macro-assembler header file >> - Remove unnecessary movapd definitions in macro-assembler header file > > src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 49: > >> 47: /******************************************************************************/ >> 48: >> 49: ATTRIBUTE_ALIGNED(4) static const juint _SIG_MASK[] = > > ATTRIBUTE_ALIGNED expands to alignas, I suggest using that directly instead The ATTRIBUTE_ALIGNED micro is used in other stub generator files. Should all of those be changed to alignas as well? Is the suggestion to change just for code readability? > src/hotspot/cpu/x86/templateInterpreterGenerator_x86_64.cpp line 503: > >> 501: >> 502: return entry_point; >> 503: } > > Is the newline removal intentional? It wasn't intentional. Thanks for spotting that. I added it back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2114557661 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2114553562 From duke at openjdk.org Thu May 29 18:56:12 2025 From: duke at openjdk.org (Mohamed Issa) Date: Thu, 29 May 2025 18:56:12 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v4] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> <72GCipLKeCWCG-4jsG5XhZKkTdVsWafEq_wA0oD-0mk=.c70af814-54fe-43d2-b7c9-72b845eb99d5@github.com> Message-ID: On Thu, 29 May 2025 08:21:29 GMT, Jatin Bhateja wrote: >> Mohamed Issa has updated the pull request incrementally with four additional commits since the last revision: >> >> - Remove comment mentioning invalid exception when NaN input is provided >> - Use rcx as base and r8 as index for address calculations in certain cbrt stub generator instructions >> - Remove unnecessary unpckhpd and unpcklpd definitions in macro-assembler header file >> - Remove unnecessary movapd definitions in macro-assembler header file > > src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 185: > >> 183: >> 184: #define __ _masm-> >> 185: > > Original Intel libm inline sequence uses hexadecimal constants, I would have preferred to use them as it is to maintain 1:1 mapping b/w instruction sequence. The assembly reference code used for this implementation uses decimal constants. > test/micro/org/openjdk/bench/java/lang/CbrtPerf.java line 56: > >> 54: public static class CbrtPerfRanges { >> 55: public static int cbrtInputCount = 2048; >> 56: > > Please create separate CbrtPerfSpecialValues for +/- 0.0 and +/- Infinity and NaN values. > I understand that handling special cases in intrinsic may impact general case performance but its ok to have atleast micro for it. Ok, I added this to the new set of micro-benchmarks. I kept them as variable values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2114551009 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2114552717 From duke at openjdk.org Thu May 29 22:33:52 2025 From: duke at openjdk.org (Mohamed Issa) Date: Thu, 29 May 2025 22:33:52 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v4] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> <72GCipLKeCWCG-4jsG5XhZKkTdVsWafEq_wA0oD-0mk=.c70af814-54fe-43d2-b7c9-72b845eb99d5@github.com> Message-ID: On Thu, 29 May 2025 08:36:31 GMT, Jatin Bhateja wrote: >> Mohamed Issa has updated the pull request incrementally with four additional commits since the last revision: >> >> - Remove comment mentioning invalid exception when NaN input is provided >> - Use rcx as base and r8 as index for address calculations in certain cbrt stub generator instructions >> - Remove unnecessary unpckhpd and unpcklpd definitions in macro-assembler header file >> - Remove unnecessary movapd definitions in macro-assembler header file > > Patch looks good to me, some comment included. @jatin-bhateja Please let me know if there's anything else to address. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24470#issuecomment-2920726222 From jwaters at openjdk.org Fri May 30 05:40:54 2025 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 30 May 2025 05:40:54 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v4] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> <72GCipLKeCWCG-4jsG5XhZKkTdVsWafEq_wA0oD-0mk=.c70af814-54fe-43d2-b7c9-72b845eb99d5@github.com> Message-ID: On Thu, 29 May 2025 18:52:51 GMT, Mohamed Issa wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 49: >> >>> 47: /******************************************************************************/ >>> 48: >>> 49: ATTRIBUTE_ALIGNED(4) static const juint _SIG_MASK[] = >> >> ATTRIBUTE_ALIGNED expands to alignas, I suggest using that directly instead > > The ATTRIBUTE_ALIGNED macro is used in other stub generator files. Should all of those be changed to alignas as well? If so, would it best to make those changes in a separate PR? > > Also, is the suggestion to change just for code readability? There's no need to change the other files, that would be out of scope for this Pull Request. Yes, it's just a suggestion for readability, it can be ignored if you deem it as not necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2115174843 From eosterlund at openjdk.org Fri May 30 07:58:51 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 30 May 2025 07:58:51 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror In-Reply-To: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> Message-ID: On Wed, 28 May 2025 10:28:38 GMT, Doug Simon wrote: > The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. > This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25488#pullrequestreview-2880492044 From duke at openjdk.org Fri May 30 15:06:04 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:04 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. Message-ID: Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. ------------- Commit messages: - implement getAllMethods - address reviewer feedback - Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. Changes: https://git.openjdk.org/jdk/pull/25498/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25498&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357987 Stats: 107 lines in 11 files changed: 106 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25498.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25498/head:pull/25498 PR: https://git.openjdk.org/jdk/pull/25498 From dnsimon at openjdk.org Fri May 30 15:06:08 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:08 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. In-Reply-To: References: Message-ID: On Wed, 28 May 2025 15:55:39 GMT, Tom Shull wrote: > Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. > > To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. I also updated the title of https://bugs.openjdk.org/browse/JDK-8357987 to Not Be All Capitalized so you'll need to fix the title of this PR. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 580: > 578: C2V_END > 579: > 580: C2V_VMENTRY_0(jboolean, isOverpass,(JNIEnv* env, jobject, ARGUMENT_PAIR(method))) Delete this method - it's no longer used. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 3315: > 3313: {CC "setNotInlinableOrCompilable", CC "(" HS_METHOD2 ")V", FN_PTR(setNotInlinableOrCompilable)}, > 3314: {CC "isCompilable", CC "(" HS_METHOD2 ")Z", FN_PTR(isCompilable)}, > 3315: {CC "isOverpass", CC "(" HS_METHOD2 ")Z", FN_PTR(isOverpass)}, delete src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 179: > 177: private native boolean isCompilable(HotSpotResolvedJavaMethodImpl method, long methodPointer); > 178: > 179: /** Delete this method - it's no longer used. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 1162: > 1160: > 1161: /** > 1162: * Gets the {@link ResolvedJavaMethod}s for all non-overpass instance methods of {@code klass}. all non-overpass and non-constructor src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 1171: > 1169: > 1170: /** > 1171: * Gets the {@link ResolvedJavaMethod}s for all instance methods of {@code klass}. instance -> non-static Instance -> NonStatic src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java line 583: > 581: @Override > 582: public boolean isDeclared() { > 583: if (isConstructor() || isStatic()) { `isStatic()` -> `isClassInitializer()` src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java line 586: > 584: return false; > 585: } > 586: return !compilerToVM().isOverpass(this); I think you can do this with a direct flag check: boolean isOverpass = (getConstMethodFlags() & config().constMethodIsOverpass) != 0; return isOverpass; See #20256 as an example of the other changes needed for this. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ResolvedJavaMethod.java line 118: > 116: > 117: /** > 118: * Returns {@code true} if this method would be contained in the array returned by `would be` -> `is` src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ResolvedJavaType.java line 370: > 368: > 369: /** > 370: * Returns a list containing all the non-static methods present within this type. Point out that the returned list is unmodifiable (like the API for `Stream.toList()` does). test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaType.java line 1027: > 1025: ResolvedJavaType type = metaAccess.lookupJavaType(c); > 1026: Set allMethods = new HashSet<>(type.getAllMethods(true)); > 1027: boolean included = Arrays.stream(type.getDeclaredMethods()).allMatch(m -> allMethods.contains(m)); You can produce a more helpful error message by collecting the entries from getDeclaredMethods, getDeclaredConstructors and the class initialized that are *not* in `allMethods`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25498#issuecomment-2921656256 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113593898 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113594155 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113593301 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112455015 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112455704 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112434269 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112449433 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112420844 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112451810 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2115430479 From duke at openjdk.org Fri May 30 15:06:09 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:09 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. In-Reply-To: References: Message-ID: <621JpJVqtfhOtmuHd54KXE7kbOW_RzTQuudFesTADJ0=.d0985feb-fd1a-45eb-8246-261cb3127d2a@github.com> On Wed, 28 May 2025 17:54:27 GMT, Doug Simon wrote: >> Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. >> >> To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 1171: > >> 1169: >> 1170: /** >> 1171: * Gets the {@link ResolvedJavaMethod}s for all instance methods of {@code klass}. > > instance -> non-static > Instance -> NonStatic I realized NonStatic is not accurate - we return everything except `s` and `` - so I switched to `NonInitializerMethods` everywhere. Does that seem fair? > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java line 586: > >> 584: return false; >> 585: } >> 586: return !compilerToVM().isOverpass(this); > > I think you can do this with a direct flag check: > > boolean isOverpass = (getConstMethodFlags() & config().constMethodIsOverpass) != 0; > return isOverpass; > > See #20256 as an example of the other changes needed for this. good call. changed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112886022 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112884946 From duke at openjdk.org Fri May 30 15:06:09 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:09 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. In-Reply-To: <621JpJVqtfhOtmuHd54KXE7kbOW_RzTQuudFesTADJ0=.d0985feb-fd1a-45eb-8246-261cb3127d2a@github.com> References: <621JpJVqtfhOtmuHd54KXE7kbOW_RzTQuudFesTADJ0=.d0985feb-fd1a-45eb-8246-261cb3127d2a@github.com> Message-ID: On Wed, 28 May 2025 22:46:46 GMT, Tom Shull wrote: >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 1171: >> >>> 1169: >>> 1170: /** >>> 1171: * Gets the {@link ResolvedJavaMethod}s for all instance methods of {@code klass}. >> >> instance -> non-static >> Instance -> NonStatic > > I realized NonStatic is not accurate - we return everything except `s` and `` - so I switched to `NonInitializerMethods` everywhere. Does that seem fair? thinking about it more, it's probably better if we do no filtering and return all methods in `InstanceKlass->_methods`. How about something like `getAllMethods`: ``` /** * Returns a list containing all methods present within this type. This list can include * methods implicitly created and used by the VM. * The returned List is unmodifiable; calls to any mutator method * will always cause {@code UnsupportedOperationException} to be thrown. * * @param forceLink if {@code true}, forces this type to be {@link #link linked} */ List getAllMethods(boolean forceLink); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113338609 From dnsimon at openjdk.org Fri May 30 15:06:09 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:09 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. In-Reply-To: References: <621JpJVqtfhOtmuHd54KXE7kbOW_RzTQuudFesTADJ0=.d0985feb-fd1a-45eb-8246-261cb3127d2a@github.com> Message-ID: <5TwuZTOvXugCHTiNOQpfYWtfwgV9b0HTyzoPdRMSB3U=.8e616674-895a-4c16-9d4e-2655b7b410f7@github.com> On Thu, 29 May 2025 06:56:18 GMT, Tom Shull wrote: >> I realized NonStatic is not accurate - we return everything except `s` and `` - so I switched to `NonInitializerMethods` everywhere. Does that seem fair? > > thinking about it more, it's probably better if we do no filtering and return all methods in `InstanceKlass->_methods`. How about something like `getAllMethods`: > > ``` > /** > * Returns a list containing all methods present within this type. This list can include > * methods implicitly created and used by the VM. > * The returned List is unmodifiable; calls to any mutator method > * will always cause {@code UnsupportedOperationException} to be thrown. > * > * @param forceLink if {@code true}, forces this type to be {@link #link linked} > */ > List getAllMethods(boolean forceLink); Yes, that's a good idea - it's more future proof and lets the caller do the filtering. `This list can include methods implicitly created and used by the VM that are not present in {@link #getDeclaredMethods}.` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113592457 From duke at openjdk.org Fri May 30 15:06:29 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:29 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool Message-ID: This PR adds support for directly retrieving all invokedynamic BootstrapMethodInvocations from a ConstantPool. In addition, two methods are added to the BootstrapMethodInvocations: 1. `void resolveInvokeDynamic()` 2. `JavaConstant lookupInvokeDynamicAppendix()` The combination of these two features allows one to directly interact with all invokedynamic information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes ------------- Commit messages: - complete changes - commit review suggestion - commit review suggestion - change to allow both indys and condys to be looked up all at once - address reviewer feedback - style fixes and add testing to TestDynamicConstants. - Add support for retrieving all Indy BootstrapMethodInvocations from Constant Pool. Changes: https://git.openjdk.org/jdk/pull/25420/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25420&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357660 Stats: 142 lines in 5 files changed: 130 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/25420.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25420/head:pull/25420 PR: https://git.openjdk.org/jdk/pull/25420 From dnsimon at openjdk.org Fri May 30 15:06:30 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: References: Message-ID: On Fri, 23 May 2025 17:37:14 GMT, Tom Shull wrote: > This PR adds support for directly retrieving all invokedynamic BootstrapMethodInvocations from a ConstantPool. > > In addition, two methods are added to the BootstrapMethodInvocations: > 1. `void resolveInvokeDynamic()` > 2. `JavaConstant lookupInvokeDynamicAppendix()` > > The combination of these two features allows one to directly interact with all invokedynamic information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes Please add some tests for the new methods to `test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/src/jdk/vm/ci/hotspot/test/TestDynamicConstant.java`. I also updated the title of https://bugs.openjdk.org/browse/JDK-8357660 to Not Be All Capitalized so you'll need to fix the title of this PR. Also, please update both titles and descriptions further to reflect the final changes (i.e. lookupBootstrapMethodInvocations instead of lookupIndyBootstrapMethodInvocations). src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 476: > 474: > 475: /** > 476: * Returns the number of {@code ResolvedIndyEntry} present within this constant `{@code ResolvedIndyEntry}` -> `{@code ResolvedIndyEntry}s` src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 540: > 538: private final JavaConstant type; > 539: private final List staticArguments; > 540: private final int index; index -> cpiOrIndyIndex src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 651: > 649: return List.of(); > 650: } > 651: return IntStream.range(0, numIndys).mapToObj(i -> lookupBootstrapMethodInvocation(i, Bytecodes.INVOKEDYNAMIC)) Suggestion: return IntStream.range(0, numIndys) .mapToObj(i -> lookupBootstrapMethodInvocation(i, Bytecodes.INVOKEDYNAMIC)) .toList(); src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 654: > 652: .toList(); > 653: } else { > 654: return IntStream.range(1, length()).filter(cpi -> { Suggestion: return IntStream.range(1, length()) .filter(this::isDynamicEntry) .mapToObj(...); and: private boolean isDynamicEntry(int cpi) { JvmConstant tagAt = getTagAt(cpi); return tagAt != null && tagAt.name.equals("Dynamic"); } src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 657: > 655: } else { > 656: return IntStream.range(1, length()) > 657: .filter(this::isDynamicEntry) Looks like you forgot to add the definition of `isDynamicEntry` that I suggested: private boolean isDynamicEntry(int cpi) { JvmConstant tagAt = getTagAt(cpi); return tagAt != null && tagAt.name.equals("Dynamic"); } src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java line 198: > 196: * If this bootstrap method invocation is for a {@code > 197: * CONSTANTAdd_InvokeDynamic_info} pool entry, then this method ensures the > 198: * invoke dynamic is resolved. This can be used to compile time resolve the What exactly does resolving an invoke dynamic mean? Also I would leave out the sentence about "compile time" unless you clarify exactly what that means. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java line 233: > 231: > 232: /** > 233: * Returns the BootstrapMethodInvocation instances for all invokedynamic Point out that the returned list is unmodifiable (like the API for `Stream.toList()` does). src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java line 237: > 235: * is returned. > 236: */ > 237: BootstrapMethodInvocation[] lookupAllIndyBootstrapMethodInvocations(); Why not make this return all `BootstrapMethodInvocation`s? The caller can then filter out the indy ones with `isInvokeDynamic`. Also, please return a `List` instead of an array - we should never return arrays from JVMCI (see #23159 as an example of addressing existing API). Lastly, return `List.of()` instead of null. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25420#issuecomment-2906643446 PR Comment: https://git.openjdk.org/jdk/pull/25420#issuecomment-2921667337 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2107428322 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2115447272 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2114177826 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2114187417 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2114737379 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2107430633 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2112429562 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2107441215 From duke at openjdk.org Fri May 30 15:06:09 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:09 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. In-Reply-To: <5TwuZTOvXugCHTiNOQpfYWtfwgV9b0HTyzoPdRMSB3U=.8e616674-895a-4c16-9d4e-2655b7b410f7@github.com> References: <621JpJVqtfhOtmuHd54KXE7kbOW_RzTQuudFesTADJ0=.d0985feb-fd1a-45eb-8246-261cb3127d2a@github.com> <5TwuZTOvXugCHTiNOQpfYWtfwgV9b0HTyzoPdRMSB3U=.8e616674-895a-4c16-9d4e-2655b7b410f7@github.com> Message-ID: <3kzvHswjZ98huXibmqouApRGInSf3rwkIwQReBOCANc=.c51d8496-6032-4387-8b5b-fba8b1d7adf4@github.com> On Thu, 29 May 2025 09:40:37 GMT, Doug Simon wrote: >> thinking about it more, it's probably better if we do no filtering and return all methods in `InstanceKlass->_methods`. How about something like `getAllMethods`: >> >> ``` >> /** >> * Returns a list containing all methods present within this type. This list can include >> * methods implicitly created and used by the VM. >> * The returned List is unmodifiable; calls to any mutator method >> * will always cause {@code UnsupportedOperationException} to be thrown. >> * >> * @param forceLink if {@code true}, forces this type to be {@link #link linked} >> */ >> List getAllMethods(boolean forceLink); > > Yes, that's a good idea - it's more future proof and lets the caller do the filtering. > > `This list can include methods implicitly created and used by the VM that are not present in {@link #getDeclaredMethods}.` I changed it now to be `getAllMethods` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2114796740 From dnsimon at openjdk.org Fri May 30 15:06:09 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:09 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. In-Reply-To: References: Message-ID: On Wed, 28 May 2025 17:41:15 GMT, Doug Simon wrote: >> Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. >> >> To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java line 583: > >> 581: @Override >> 582: public boolean isDeclared() { >> 583: if (isConstructor() || isStatic()) { > > `isStatic()` -> `isClassInitializer()` Looks like you did not yet make the `isClassInitializer()` fix. This also implies some missing test coverage in TestResolvedJavaType. Can you please address both these issues. > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ResolvedJavaMethod.java line 118: > >> 116: >> 117: /** >> 118: * Returns {@code true} if this method would be contained in the array returned by > > `would be` -> `is` not yet fixed (or pushed?) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113583058 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113587598 From duke at openjdk.org Fri May 30 15:06:30 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: References: Message-ID: On Sat, 24 May 2025 08:49:54 GMT, Doug Simon wrote: >> This PR adds support for directly retrieving all invokedynamic BootstrapMethodInvocations from a ConstantPool. >> >> In addition, two methods are added to the BootstrapMethodInvocations: >> 1. `void resolveInvokeDynamic()` >> 2. `JavaConstant lookupInvokeDynamicAppendix()` >> >> The combination of these two features allows one to directly interact with all invokedynamic information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes > > Please add some tests for the new methods to `test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/src/jdk/vm/ci/hotspot/test/TestDynamicConstant.java`. @dougxc I integrated testing for the new methods into `TestDynamicConstant.java` now @dougxc I cleaned up the PR to now have the symmetric lookup option and updated the tests > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 657: > >> 655: } else { >> 656: return IntStream.range(1, length()) >> 657: .filter(this::isDynamicEntry) > > Looks like you forgot to add the definition of `isDynamicEntry` that I suggested: > > private boolean isDynamicEntry(int cpi) { > JvmConstant tagAt = getTagAt(cpi); > return tagAt != null && tagAt.name.equals("Dynamic"); > } Yes, I applied the suggested change via github, and am just validating it works now (which of course it doesn't). I'll fix it > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java line 198: > >> 196: * If this bootstrap method invocation is for a {@code >> 197: * CONSTANTAdd_InvokeDynamic_info} pool entry, then this method ensures the >> 198: * invoke dynamic is resolved. This can be used to compile time resolve the > > What exactly does resolving an invoke dynamic mean? > Also I would leave out the sentence about "compile time" unless you clarify exactly what that means. Would you want me to add a reference to https://docs.oracle.com/javase/specs/jvms/se24/html/jvms-5.html#jvms-5.4.3.6? I removed the compile time sentence; I had it to be consistent with `loadReferencedType` > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java line 237: > >> 235: * is returned. >> 236: */ >> 237: BootstrapMethodInvocation[] lookupAllIndyBootstrapMethodInvocations(); > > Why not make this return all `BootstrapMethodInvocation`s? The caller can then filter out the indy ones with `isInvokeDynamic`. Also, please return a `List` instead of an array - we should never return arrays from JVMCI (see #23159 as an example of addressing existing API). Lastly, return `List.of()` instead of null. Changed to return a list. > Why not make this return all BootstrapMethodInvocations 1. Within HotSpot it is very easy to pick off all indy BootstrapMethodInvocations via [the ConstantPoolCache](https://github.com/openjdk/jdk/blob/72a3022dc6a1521d8e3f08fe5d592f760fc462d2/src/hotspot/share/oops/cpCache.hpp#L74) 2. Each invokedynamic bytecode location has a unique BootstrapMethodInvocation instance, but they may share the same constant pool entry, so it's not trivial to find all BootstrapMethodInvocations. One would have to iterate both all method bytecodes and constant pool slots, and do some additional filtering. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25420#issuecomment-2909796813 PR Comment: https://git.openjdk.org/jdk/pull/25420#issuecomment-2918426821 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2114780251 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2109301347 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2109317539 From dnsimon at openjdk.org Fri May 30 15:06:30 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: References: Message-ID: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> On Tue, 27 May 2025 14:07:21 GMT, Tom Shull wrote: > Would you want me to add a reference The main point is that resolving can execute Java code (as far as I recall) so cannot be called from a CompileBroker thread as these threads must not call Java code. However, I see that this constraint is not currently documented so it ok to leave it out for now. >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java line 237: >> >>> 235: * is returned. >>> 236: */ >>> 237: BootstrapMethodInvocation[] lookupAllIndyBootstrapMethodInvocations(); >> >> Why not make this return all `BootstrapMethodInvocation`s? The caller can then filter out the indy ones with `isInvokeDynamic`. Also, please return a `List` instead of an array - we should never return arrays from JVMCI (see #23159 as an example of addressing existing API). Lastly, return `List.of()` instead of null. > > Changed to return a list. > >> Why not make this return all BootstrapMethodInvocations > 1. Within HotSpot it is very easy to pick off all indy BootstrapMethodInvocations via [the ConstantPoolCache](https://github.com/openjdk/jdk/blob/72a3022dc6a1521d8e3f08fe5d592f760fc462d2/src/hotspot/share/oops/cpCache.hpp#L74) > 2. Each invokedynamic bytecode location has a unique BootstrapMethodInvocation instance, but they may share the same constant pool entry, so it's not trivial to find all BootstrapMethodInvocations. One would have to iterate both all method bytecodes and constant pool slots, and do some additional filtering. How about `List lookupBootstrapMethodInvocations(boolean indy)`? That is, it either gets the indy *or* the condy BSM invocations. I can imagine SVM wanting the latter at some point right? BTW, I noticed that the javadoc for `ConstantPool.lookupBootstrapMethodInvocation` is somewhat incorrect. Please check and apply these corrections in this PR: diff --git a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java index 2273b256f03..3519af4bcbb 100644 --- a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java +++ b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java @@ -199,12 +199,12 @@ interface BootstrapMethodInvocation { * in the constant pool. * * @param index if {@code opcode} is -1, {@code index} is a constant pool index. Otherwise {@code opcode} - * must be {@code Bytecodes.INVOKEDYNAMIC}, and {@code index} must be the operand of that - * opcode in the bytecode stream (i.e., a {@code rawIndex}). - * @param opcode must be {@code Bytecodes.INVOKEDYNAMIC}, or -1 if + * must be {@code Bytecodes.INVOKEDYNAMIC} or {@code CONSTANT_Dynamic_info}, and {@code index} + * must be the operand of that opcode in the bytecode stream (i.e., a {@code rawIndex}). + * @param opcode must be {@code Bytecodes.INVOKEDYNAMIC}, {@code CONSTANT_Dynamic_info}, or -1 if * {@code index} was not decoded from a bytecode stream * @return the bootstrap method invocation details or {@code null} if the entry specified by {@code index} - * is not a {@code CONSTANT_Dynamic_info} or @{code CONSTANT_InvokeDynamic_info} + * is not a {@code CONSTANT_Dynamic_info} or {@code CONSTANT_InvokeDynamic_info} * @jvms 4.7.23 The {@code BootstrapMethods} Attribute */ default BootstrapMethodInvocation lookupBootstrapMethodInvocation(int index, int opcode) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2109436288 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2109450651 From duke at openjdk.org Fri May 30 15:06:30 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> References: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> Message-ID: On Tue, 27 May 2025 15:03:02 GMT, Doug Simon wrote: >> Changed to return a list. >> >>> Why not make this return all BootstrapMethodInvocations >> 1. Within HotSpot it is very easy to pick off all indy BootstrapMethodInvocations via [the ConstantPoolCache](https://github.com/openjdk/jdk/blob/72a3022dc6a1521d8e3f08fe5d592f760fc462d2/src/hotspot/share/oops/cpCache.hpp#L74) >> 2. Each invokedynamic bytecode location has a unique BootstrapMethodInvocation instance, but they may share the same constant pool entry, so it's not trivial to find all BootstrapMethodInvocations. One would have to iterate both all method bytecodes and constant pool slots, and do some additional filtering. > > How about `List lookupBootstrapMethodInvocations(boolean indy)`? That is, it either gets the indy *or* the condy BSM invocations. I can imagine SVM wanting the latter at some point right? > > BTW, I noticed that the javadoc for `ConstantPool.lookupBootstrapMethodInvocation` is somewhat incorrect. Please check and apply these corrections in this PR: > > diff --git a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java > index 2273b256f03..3519af4bcbb 100644 > --- a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java > +++ b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java > @@ -199,12 +199,12 @@ interface BootstrapMethodInvocation { > * in the constant pool. > * > * @param index if {@code opcode} is -1, {@code index} is a constant pool index. Otherwise {@code opcode} > - * must be {@code Bytecodes.INVOKEDYNAMIC}, and {@code index} must be the operand of that > - * opcode in the bytecode stream (i.e., a {@code rawIndex}). > - * @param opcode must be {@code Bytecodes.INVOKEDYNAMIC}, or -1 if > + * must be {@code Bytecodes.INVOKEDYNAMIC} or {@code CONSTANT_Dynamic_info}, and {@code index} > + * must be the operand of that opcode in the bytecode stream (i.e., a {@code rawIndex}). > + * @param opcode must be {@code Bytecodes.INVOKEDYNAMIC}, {@code CONSTANT_Dynamic_info}, or -1 if > * {@code index} was not decoded from a bytecode stream > * @return the bootstrap method invocation details or {@code null} if the entry specified by {@code index} > - * is not a {@code CONSTANT_Dynamic_info} or @{code CONSTANT_InvokeDynamic_info} > + * is not a {@code CONSTANT_Dynamic_info} or {@code CONSTANT_InvokeDynamic_info} > * @jvms 4.7.23 The {@code BootstrapMethods} Attribute > */ > default BootstrapMethodInvocation lookupBootstrapMethodInvocation(int index, int opcode) { I prototyped the option `List lookupBootstrapMethodInvocations(boolean indy)` here: https://github.com/openjdk/jdk/compare/master...teshull:jdk:jvmci_bootstrap_alternative As part of this I also prototyped generic BSM resolution / lookup logic >From the SVM perspective, retrieving condys via this new support isn't a big win. It's easy enough already to walk the ConstantPool. However, for symmetry purposes, it is reasonable to have this method (along with the resolve / lookup). What's your preference: this new version or the original? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2110104069 From dnsimon at openjdk.org Fri May 30 15:06:30 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: References: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> Message-ID: <3Lyb5MHjplhxqRmlkR6y-GpgQWe90ij_jClRdipKMQE=.cf4fcf50-4b4f-4930-abdb-75f9d0be9942@github.com> On Tue, 27 May 2025 20:10:50 GMT, Tom Shull wrote: >> How about `List lookupBootstrapMethodInvocations(boolean indy)`? That is, it either gets the indy *or* the condy BSM invocations. I can imagine SVM wanting the latter at some point right? >> >> BTW, I noticed that the javadoc for `ConstantPool.lookupBootstrapMethodInvocation` is somewhat incorrect. Please check and apply these corrections in this PR: >> >> diff --git a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java >> index 2273b256f03..3519af4bcbb 100644 >> --- a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java >> +++ b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java >> @@ -199,12 +199,12 @@ interface BootstrapMethodInvocation { >> * in the constant pool. >> * >> * @param index if {@code opcode} is -1, {@code index} is a constant pool index. Otherwise {@code opcode} >> - * must be {@code Bytecodes.INVOKEDYNAMIC}, and {@code index} must be the operand of that >> - * opcode in the bytecode stream (i.e., a {@code rawIndex}). >> - * @param opcode must be {@code Bytecodes.INVOKEDYNAMIC}, or -1 if >> + * must be {@code Bytecodes.INVOKEDYNAMIC} or {@code CONSTANT_Dynamic_info}, and {@code index} >> + * must be the operand of that opcode in the bytecode stream (i.e., a {@code rawIndex}). >> + * @param opcode must be {@code Bytecodes.INVOKEDYNAMIC}, {@code CONSTANT_Dynamic_info}, or -1 if >> * {@code index} was not decoded from a bytecode stream >> * @return the bootstrap method invocation details or {@code null} if the entry specified by {@code index} >> - * is not a {@code CONSTANT_Dynamic_info} or @{code CONSTANT_InvokeDynamic_info} >> + * is not a {@code CONSTANT_Dynamic_info} or {@code CONSTANT_InvokeDynamic_info} >> * @jvms 4.7.23 The {@code BootstrapMethods} Attribute >> */ >> default BootstrapMethodInvocation lookupBootstrapMethodInvocation(int index, int opcode) { > > I prototyped the option `List lookupBootstrapMethodInvocations(boolean indy)` here: https://github.com/openjdk/jdk/compare/master...teshull:jdk:jvmci_bootstrap_alternative > > As part of this I also prototyped generic BSM resolution / lookup logic > > From the SVM perspective, retrieving condys via this new support isn't a big win. It's easy enough already to walk the ConstantPool. However, for symmetry purposes, it is reasonable to have this method (along with the resolve / lookup). What's your preference: this new version or the original? I like the symmetry of the new version. Also, I think you can simplify things by replacing use of `flatMap` [here](https://github.com/openjdk/jdk/compare/master...teshull:jdk:jvmci_bootstrap_alternative#diff-b782878562668748c5c59acc2e937f7c24de4529b8a74bd3a4eae83fa0e07846R679) with `filter`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2111539245 From duke at openjdk.org Fri May 30 15:06:30 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: <3Lyb5MHjplhxqRmlkR6y-GpgQWe90ij_jClRdipKMQE=.cf4fcf50-4b4f-4930-abdb-75f9d0be9942@github.com> References: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> <3Lyb5MHjplhxqRmlkR6y-GpgQWe90ij_jClRdipKMQE=.cf4fcf50-4b4f-4930-abdb-75f9d0be9942@github.com> Message-ID: On Wed, 28 May 2025 10:45:07 GMT, Doug Simon wrote: >> I prototyped the option `List lookupBootstrapMethodInvocations(boolean indy)` here: https://github.com/openjdk/jdk/compare/master...teshull:jdk:jvmci_bootstrap_alternative >> >> As part of this I also prototyped generic BSM resolution / lookup logic >> >> From the SVM perspective, retrieving condys via this new support isn't a big win. It's easy enough already to walk the ConstantPool. However, for symmetry purposes, it is reasonable to have this method (along with the resolve / lookup). What's your preference: this new version or the original? > > I like the symmetry of the new version. Also, I think you can simplify things by replacing use of `flatMap` [here](https://github.com/openjdk/jdk/compare/master...teshull:jdk:jvmci_bootstrap_alternative#diff-b782878562668748c5c59acc2e937f7c24de4529b8a74bd3a4eae83fa0e07846R679) with `filter`. I updated the javadoc misplaced `@` in `{@code}`. However, the `opcode` doc changes look wrong to me; the opcode must be -1 or INVOKEDYNAMIC (https://github.com/openjdk/jdk/blob/04e0fe00abcf1d7919a50e0c9dd44ce2856984ea/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java#L592) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2113271157 From dnsimon at openjdk.org Fri May 30 15:06:30 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: References: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> <3Lyb5MHjplhxqRmlkR6y-GpgQWe90ij_jClRdipKMQE=.cf4fcf50-4b4f-4930-abdb-75f9d0be9942@github.com> Message-ID: On Thu, 29 May 2025 06:04:24 GMT, Tom Shull wrote: >> I like the symmetry of the new version. Also, I think you can simplify things by replacing use of `flatMap` [here](https://github.com/openjdk/jdk/compare/master...teshull:jdk:jvmci_bootstrap_alternative#diff-b782878562668748c5c59acc2e937f7c24de4529b8a74bd3a4eae83fa0e07846R679) with `filter`. > > I updated the javadoc misplaced `@` in `{@code}`. However, the `opcode` doc changes look wrong to me; the opcode must be -1 or INVOKEDYNAMIC (https://github.com/openjdk/jdk/blob/04e0fe00abcf1d7919a50e0c9dd44ce2856984ea/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java#L592) yeah, looks like you're right. I was basing my assumption on `case "Dynamic"` in: @Override public BootstrapMethodInvocation lookupBootstrapMethodInvocation(int index, int opcode) { int cpi = opcode == -1 ? index : indyIndexConstantPoolIndex(index, opcode); final JvmConstant tag = getTagAt(cpi); switch (tag.name) { case "InvokeDynamic": case "Dynamic": I guess it's possible for an INVOKEDYNAMIC to resolve it's cpi to a CONSTANT_Dynamic entry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2113973088 From duke at openjdk.org Fri May 30 15:06:30 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: References: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> <3Lyb5MHjplhxqRmlkR6y-GpgQWe90ij_jClRdipKMQE=.cf4fcf50-4b4f-4930-abdb-75f9d0be9942@github.com> Message-ID: On Thu, 29 May 2025 13:40:55 GMT, Doug Simon wrote: >> I updated the javadoc misplaced `@` in `{@code}`. However, the `opcode` doc changes look wrong to me; the opcode must be -1 or INVOKEDYNAMIC (https://github.com/openjdk/jdk/blob/04e0fe00abcf1d7919a50e0c9dd44ce2856984ea/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java#L592) > > yeah, looks like you're right. I was basing my assumption on `case "Dynamic"` in: > > @Override > public BootstrapMethodInvocation lookupBootstrapMethodInvocation(int index, int opcode) { > int cpi = opcode == -1 ? index : indyIndexConstantPoolIndex(index, opcode); > final JvmConstant tag = getTagAt(cpi); > switch (tag.name) { > case "InvokeDynamic": > case "Dynamic": > > I guess it's possible for an INVOKEDYNAMIC to resolve it's cpi to a CONSTANT_Dynamic entry. I think INVOKEDYNAMIC should always point to a CONSTANT_InvokeDynamic entry ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2114794800 From never at openjdk.org Fri May 30 16:07:52 2025 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 30 May 2025 16:07:52 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror In-Reply-To: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> Message-ID: On Wed, 28 May 2025 10:28:38 GMT, Doug Simon wrote: > The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. > This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. src/hotspot/share/jvmci/jvmciRuntime.cpp line 801: > 799: > 800: void JVMCINMethodData::invalidate_nmethod_mirror(nmethod* nm) { > 801: if (_nmethod_mirror_index == -1) { This part is actually wrong as that's the first part of `get_nmethod_mirror` and we must always check that `get_nmethod_mirror` doesn't return nullptr. I'd assumed that the mirror was always non-null if `_nmethod_mirror_index != -1` but that's not true. The slot is reserved for all non-default nmethods and must stay around so that `translate` can work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25488#discussion_r2116193278 From rkennke at openjdk.org Fri May 30 16:13:25 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 30 May 2025 16:13:25 GMT Subject: RFR: 8358169: Shenandoah/JVMCI: Export GC state constants Message-ID: We need the GC state enum constants available in JVMCI. ------------- Commit messages: - 8358169: Shenandoah/JVMCI: Export GC state constants Changes: https://git.openjdk.org/jdk/pull/25552/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25552&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358169 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25552.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25552/head:pull/25552 PR: https://git.openjdk.org/jdk/pull/25552 From dnsimon at openjdk.org Fri May 30 16:39:51 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 16:39:51 GMT Subject: RFR: 8358169: Shenandoah/JVMCI: Export GC state constants In-Reply-To: References: Message-ID: On Fri, 30 May 2025 16:09:03 GMT, Roman Kennke wrote: > We need the GC state enum constants available in JVMCI. Looks good. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25552#pullrequestreview-2881865876 From jbhateja at openjdk.org Fri May 30 17:22:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 30 May 2025 17:22:53 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v4] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> <72GCipLKeCWCG-4jsG5XhZKkTdVsWafEq_wA0oD-0mk=.c70af814-54fe-43d2-b7c9-72b845eb99d5@github.com> Message-ID: On Thu, 29 May 2025 18:49:28 GMT, Mohamed Issa wrote: >> test/micro/org/openjdk/bench/java/lang/CbrtPerf.java line 56: >> >>> 54: public static class CbrtPerfRanges { >>> 55: public static int cbrtInputCount = 2048; >>> 56: >> >> Please create separate CbrtPerfSpecialValues for +/- 0.0 and +/- Infinity and NaN values. >> I understand that handling special cases in intrinsic may impact general case performance but its ok to have atleast micro for it. > > Ok, I added this to the new set of micro-benchmarks. I kept them as variable values. With Intrinsic Disabled:- Benchmark Mode Cnt Score Error Units CbrtPerf.CbrtPerfSpecialValues.cbrtDouble0 thrpt 2 1343559.770 ops/ms CbrtPerf.CbrtPerfSpecialValues.cbrtDoubleInf thrpt 2 881930.283 ops/ms CbrtPerf.CbrtPerfSpecialValues.cbrtDoubleNaN thrpt 2 973307.409 ops/ms CbrtPerf.CbrtPerfSpecialValues.cbrtDoubleNegative0 thrpt 2 1342454.046 ops/ms CbrtPerf.CbrtPerfSpecialValues.cbrtDoubleNegativeInf thrpt 2 880169.071 ops/ms With Intrinsic Enabled:- Benchmark Mode Cnt Score Error Units CbrtPerf.CbrtPerfSpecialValues.cbrtDouble0 thrpt 2 293228.991 ops/ms CbrtPerf.CbrtPerfSpecialValues.cbrtDoubleInf thrpt 2 329190.573 ops/ms CbrtPerf.CbrtPerfSpecialValues.cbrtDoubleNaN thrpt 2 334625.414 ops/ms CbrtPerf.CbrtPerfSpecialValues.cbrtDoubleNegative0 thrpt 2 270939.709 ops/ms CbrtPerf.CbrtPerfSpecialValues.cbrtDoubleNegativeInf thrpt 2 328087.618 ops/ms As expected, optimized intrinsic penalizes special case performance to optimize generic case control paths. Have you tried adding these special checks and measuring the impact on performance? Alternatively, we can create a follow up JBS to address it later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2116295936 From jbhateja at openjdk.org Fri May 30 17:58:55 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 30 May 2025 17:58:55 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v5] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Thu, 29 May 2025 18:56:11 GMT, Mohamed Issa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> The command to run all range specific micro-benchmarks is posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. >> >> For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. >> >> | Input range(s) | Baseline throughput (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | >> | :-------------------------------------: | :-------------------------------: | :-------------------------------: | :---------: | >> | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | >> >> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. > > Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision: > > - Add newline back to templateInterpreterGenerator_x86_64.cpp source file > - Add special case values to cbrt micro-benchmark set LGTM, we have already created follow up JBSs for known limiations. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24470#pullrequestreview-2882036242 From duke at openjdk.org Fri May 30 18:43:53 2025 From: duke at openjdk.org (duke) Date: Fri, 30 May 2025 18:43:53 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v5] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Thu, 29 May 2025 18:56:11 GMT, Mohamed Issa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> The command to run all range specific micro-benchmarks is posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. >> >> For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. >> >> | Input range(s) | Baseline throughput (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | >> | :-------------------------------------: | :-------------------------------: | :-------------------------------: | :---------: | >> | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | >> >> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. > > Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision: > > - Add newline back to templateInterpreterGenerator_x86_64.cpp source file > - Add special case values to cbrt micro-benchmark set @missa-prime Your change (at version 233e0188c7637cdc08bb4bebd8cb4721ccc352d1) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24470#issuecomment-2923150761 From sviswanathan at openjdk.org Fri May 30 19:05:55 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 30 May 2025 19:05:55 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v5] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Thu, 29 May 2025 18:56:11 GMT, Mohamed Issa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> The command to run all range specific micro-benchmarks is posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. >> >> For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. >> >> | Input range(s) | Baseline throughput (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | >> | :-------------------------------------: | :-------------------------------: | :-------------------------------: | :---------: | >> | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | >> >> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. > > Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision: > > - Add newline back to templateInterpreterGenerator_x86_64.cpp source file > - Add special case values to cbrt micro-benchmark set src/hotspot/cpu/x86/assembler_x86.cpp line 2879: > 2877: emit_operand(dst, src, 0); > 2878: } > 2879: One more change is needed. We need to set address attributes here, as movapd has Address as one of the input: attributes.set_address_attributes(/* tuple_type */ EVEX_FVM, /* input_size_in_bits */ EVEX_NObit); This should be done before call to simd_prefix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2116482247 From duke at openjdk.org Fri May 30 19:34:16 2025 From: duke at openjdk.org (Mohamed Issa) Date: Fri, 30 May 2025 19:34:16 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v6] In-Reply-To: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. > > The command to run all range specific micro-benchmarks is posted below. > > `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. > > For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. > > | Input range(s) | Baseline throughput (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | > | :-------------------------------------: | :-------------------------------: | :-------------------------------: | :---------: | > | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | > | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | > > Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: Set address attributes in movapd assembly instruction function definition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24470/files - new: https://git.openjdk.org/jdk/pull/24470/files/233e0188..c931222c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24470&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24470&range=04-05 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24470/head:pull/24470 PR: https://git.openjdk.org/jdk/pull/24470 From duke at openjdk.org Fri May 30 19:34:16 2025 From: duke at openjdk.org (Mohamed Issa) Date: Fri, 30 May 2025 19:34:16 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v5] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Fri, 30 May 2025 19:03:00 GMT, Sandhya Viswanathan wrote: >> Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add newline back to templateInterpreterGenerator_x86_64.cpp source file >> - Add special case values to cbrt micro-benchmark set > > src/hotspot/cpu/x86/assembler_x86.cpp line 2879: > >> 2877: emit_operand(dst, src, 0); >> 2878: } >> 2879: > > One more change is needed. We need to set address attributes here, as movapd has Address as one of the input: > `attributes.set_address_attributes(/* tuple_type */ EVEX_FVM, /* input_size_in_bits */ EVEX_NObit);` > This should be done before call to simd_prefix. I added the change and re-ran the tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2116518295 From sviswanathan at openjdk.org Fri May 30 21:22:54 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 30 May 2025 21:22:54 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v6] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Fri, 30 May 2025 19:34:16 GMT, Mohamed Issa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> The command to run all range specific micro-benchmarks is posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. >> >> For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. >> >> | Input range(s) | Baseline throughput (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | >> | :-------------------------------------: | :-------------------------------: | :-------------------------------: | :---------: | >> | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | >> >> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Set address attributes in movapd assembly instruction function definition Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24470#pullrequestreview-2882583269 From duke at openjdk.org Fri May 30 21:27:59 2025 From: duke at openjdk.org (duke) Date: Fri, 30 May 2025 21:27:59 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v6] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Fri, 30 May 2025 19:34:16 GMT, Mohamed Issa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> The command to run all range specific micro-benchmarks is posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. >> >> For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. >> >> | Input range(s) | Baseline throughput (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | >> | :-------------------------------------: | :-------------------------------: | :-------------------------------: | :---------: | >> | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | >> >> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Set address attributes in movapd assembly instruction function definition @missa-prime Your change (at version c931222c7d40f296de14585d6c902552a1e66f5a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24470#issuecomment-2923531486 From duke at openjdk.org Fri May 30 21:49:59 2025 From: duke at openjdk.org (Mohamed Issa) Date: Fri, 30 May 2025 21:49:59 GMT Subject: Integrated: 8353686: Optimize Math.cbrt for x86 64 bit platforms In-Reply-To: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Sun, 6 Apr 2025 03:48:22 GMT, Mohamed Issa wrote: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. > > The command to run all range specific micro-benchmarks is posted below. > > `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. > > For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. > > | Input range(s) | Baseline throughput (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | > | :-------------------------------------: | :-------------------------------: | :-------------------------------: | :---------: | > | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | > | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | > > Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. This pull request has now been integrated. Changeset: 0df8c968 Author: Mohamed Issa Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/0df8c9684b8782ef830e2bd425217864c3f51784 Stats: 649 lines in 27 files changed: 637 ins; 1 del; 11 mod 8353686: Optimize Math.cbrt for x86 64 bit platforms Reviewed-by: sviswanathan, sparasa, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/24470 From epeter at openjdk.org Sat May 31 11:02:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 11:02:00 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v4] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> <72GCipLKeCWCG-4jsG5XhZKkTdVsWafEq_wA0oD-0mk=.c70af814-54fe-43d2-b7c9-72b845eb99d5@github.com> Message-ID: On Thu, 29 May 2025 22:30:57 GMT, Mohamed Issa wrote: >> Patch looks good to me, some comment included. > > @jatin-bhateja Please let me know if there's anything else to address. @missa-prime The patch looks reasonable. It would have been nice if we (from Oracle) could have tested it before integration, especially this close to RDP1 for JDK25. Just for next time. If there are issues with it now, you risk that it gets backed out, and you have to redo it, and it does not make it into JDK25. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24470#issuecomment-2924944778