From kvn at openjdk.org Fri Aug 1 00:27:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 1 Aug 2025 00:27:54 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 16:20:43 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. > > Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. > > Additional testing: > - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Thank you for your thoughts. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26559#pullrequestreview-3077233886 From dlong at openjdk.org Fri Aug 1 00:49:22 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 1 Aug 2025 00:49:22 GMT Subject: RFR: 8278874: tighten VerifyStack constraints [v9] In-Reply-To: References: Message-ID: > The VerifyStack logic in Deoptimization::unpack_frames() attempts to check the expression stack size of the interpreter frame against what GenerateOopMap computes. To do this, it needs to know if the state at the current bci represents the "before" state, meaning the bytecode will be reexecuted, or the "after" state, meaning we will advance to the next bytecode. The old code didn't know how to determine exactly what state we were in, so it checked both. This PR cleans that up, so we only have to compute the oopmap once. It also removes old SPARC support. Dean Long has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'openjdk:master' into 8278874-verifystack - more cleanup - simplify is_top_frame - readability suggestion - reviewer suggestions - Update src/hotspot/share/runtime/vframeArray.cpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/runtime/vframeArray.cpp Co-authored-by: Manuel H?ssig - better name for frame index - Update src/hotspot/share/runtime/deoptimization.cpp Co-authored-by: Manuel H?ssig - fix optimized build - ... and 2 more: https://git.openjdk.org/jdk/compare/91c07ccd...6bfda158 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26121/files - new: https://git.openjdk.org/jdk/pull/26121/files/6257de6c..6bfda158 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26121&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26121&range=07-08 Stats: 59108 lines in 1555 files changed: 33964 ins; 16451 del; 8693 mod Patch: https://git.openjdk.org/jdk/pull/26121.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26121/head:pull/26121 PR: https://git.openjdk.org/jdk/pull/26121 From xgong at openjdk.org Fri Aug 1 01:48:04 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 1 Aug 2025 01:48:04 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v8] In-Reply-To: <-otlKVhe_xfmpET_cwn5CdvzDduOfFApGSH5VoZSwuk=.7eb8a0e3-4ad6-4ffb-97fd-11a2120a3eaf@github.com> References: <-otlKVhe_xfmpET_cwn5CdvzDduOfFApGSH5VoZSwuk=.7eb8a0e3-4ad6-4ffb-97fd-11a2120a3eaf@github.com> Message-ID: On Wed, 30 Jul 2025 06:14:40 GMT, erifan wrote: >> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of the `maskAll` is >> relative smaller than that of `fromLong`. So this patch does the conversion for these cases. >> >> The conversion is done in C2's IGVN phase. And on platforms (like Arm NEON) that don't support `VectorLongToMask`, the conversion is done during intrinsiication process if `MaskAll` or `Replicate` is supported. >> >> Since this optimization requires the input long value of `VectorMask.fromLong` to be specific compile-time constants, and such expressions are usually hoisted out of the loop. So we can't see noticeable performance change. >> >> This conversion also enables further optimizations that recognize maskAll patterns, see [1]. And we can observe a performance improvement of about 7% on both aarch64 and x64. >> >> As `VectorLongToMask` is converted to `MaskAll` or `Replicate`, some existing optimizations recognizing the `VectorLongToMask` will be affected, like >> >> VectorMaskToLong (VectorLongToMask x) => x >> >> >> Hence, this patch also added the following optimizations: >> >> VectorMaskToLong (MaskAll x) => (x & (-1ULL >> (64 - vlen))) // x is -1 or 0 >> VectorMaskToLong (VectorStoreMask (Replicate x)) => (x & (-1ULL >> (64 - vlen))) // x is -1 or 0 >> >> VectorMaskCast (VectorMaskCast x) => x >> >> And we can see noticeable performance improvement with the above optimizations for floating-point types. >> >> Benchmarks on Nvidia Grace machine with option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Error After Error Uplift >> microMaskFromLongToLong_Double128 ops/s 1522384.986 1324881.46 2835774480 403575069.7 1862.71 >> microMaskFromLongToLong_Double256 ops/s 4275.415598 28.560622 4285.587451 27.633101 1 >> microMaskFromLongToLong_Double512 ops/s 3702.171936 9.528497 3692.747579 18.47744 0.99 >> microMaskFromLongToLong_Double64 ops/s 4624.452243 37.388427 4616.320519 23.455954 0.99 >> microMaskFromLongToLong_Float128 ops/s 1239661.887 1286803.852 2842927993 360468218.3 2293.3 >> microMaskFromLongToLong_Float256 ops/s 3681.64954 15.153633 3685.411771 21.737124 1 >> microMaskFromLongToLong_Float512 ops/s 3007.563025 10.189944 3022.002986 14.137287 1 >> microMaskFromLongToLong_Float64 ops/s 1646664.258 1375451.279 2948453900 397472562.4 1790.56 >> >> >> Benchmarks on AMD EPYC 9124 16-Core Processor with option `-XX:UseAVX=3`: >> >> Benchm... > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Set default warm up to 10000 for JTReg tests Still LGTM? Marked as reviewed by xgong (Committer). ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/25793#pullrequestreview-3077357464 PR Review: https://git.openjdk.org/jdk/pull/25793#pullrequestreview-3077361182 From duke at openjdk.org Fri Aug 1 01:48:05 2025 From: duke at openjdk.org (erifan) Date: Fri, 1 Aug 2025 01:48:05 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v8] In-Reply-To: References: <-otlKVhe_xfmpET_cwn5CdvzDduOfFApGSH5VoZSwuk=.7eb8a0e3-4ad6-4ffb-97fd-11a2120a3eaf@github.com> Message-ID: On Thu, 31 Jul 2025 15:29:11 GMT, Christian Hagedorn wrote: > Testing looked good! Thanks~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/25793#issuecomment-3141879113 From xgong at openjdk.org Fri Aug 1 01:51:55 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 1 Aug 2025 01:51:55 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v3] In-Reply-To: <1XFXtkTlDshGtoxEdLVg0f2J2rtn4wz7CdUB9pb9N2g=.25e7e0b5-8468-4d91-adb9-c459bda40933@github.com> References: <1XFXtkTlDshGtoxEdLVg0f2J2rtn4wz7CdUB9pb9N2g=.25e7e0b5-8468-4d91-adb9-c459bda40933@github.com> Message-ID: On Thu, 31 Jul 2025 13:55:12 GMT, Fei Gao wrote: > > I've submitted a test on a 256-bit sve machine. I'll get back to you once it?s finished. > > The new commit passed tier1 - tier3 on 256-bit `sve` machine without new failures. Thanks! Thanks so much for your test! > src/hotspot/cpu/arm/matcher_arm.hpp line 160: > >> 158: static const bool supports_encode_ascii_array = false; >> 159: >> 160: // Return true if vector gather-load/scatter-store needs vector index as input. > > If the function returns `false`, does it indicate one of the following cases? > - Vector gather-load or scatter-store does not accept a vector index for the current use case on this platform. > - The current platform does not support vector gather-load or scatter-store at all. Yes, I think so. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3141889356 PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2246712905 From dlong at openjdk.org Fri Aug 1 02:38:59 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 1 Aug 2025 02:38:59 GMT Subject: RFR: 8278874: tighten VerifyStack constraints [v9] In-Reply-To: References: Message-ID: On Fri, 1 Aug 2025 00:49:22 GMT, Dean Long wrote: >> The VerifyStack logic in Deoptimization::unpack_frames() attempts to check the expression stack size of the interpreter frame against what GenerateOopMap computes. To do this, it needs to know if the state at the current bci represents the "before" state, meaning the bytecode will be reexecuted, or the "after" state, meaning we will advance to the next bytecode. The old code didn't know how to determine exactly what state we were in, so it checked both. This PR cleans that up, so we only have to compute the oopmap once. It also removes old SPARC support. > > Dean Long has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'openjdk:master' into 8278874-verifystack > - more cleanup > - simplify is_top_frame > - readability suggestion > - reviewer suggestions > - Update src/hotspot/share/runtime/vframeArray.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/runtime/vframeArray.cpp > > Co-authored-by: Manuel H?ssig > - better name for frame index > - Update src/hotspot/share/runtime/deoptimization.cpp > > Co-authored-by: Manuel H?ssig > - fix optimized build > - ... and 2 more: https://git.openjdk.org/jdk/compare/ad91eaa9...6bfda158 I'm running Graal testing now and I've hit at least one of my new asserts so far. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26121#issuecomment-3141966534 From aboldtch at openjdk.org Fri Aug 1 06:03:56 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 1 Aug 2025 06:03:56 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 13:17:37 GMT, Fredrik Bredberg wrote: > Since the integration of [JDK-8359437](https://bugs.openjdk.org/browse/JDK-8359437) the `LockingMode` flag can no longer be set by the user, instead it's declared as `const int LockingMode = LM_LIGHTWEIGHT;`. This means that we can now safely remove all `LockingMode` related code from all platforms. > > This PR removes `LockingMode` related code from the **x86** platform. > > When all the `LockingMode` code has been removed from all platforms, we can go on and remove it from shared (non-platform specific) files as well. And finally remove the `LockingMode` variable itself. > > Passes tier1-tier5 with no added problems. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 227: > 225: void C2_MacroAssembler::fast_lock_lightweight(Register obj, Register box, Register rax_reg, > 226: Register t, Register thread) { > 227: assert(box == rbx, "Used for displaced header location"); Where does this RBX requirement come from? Do not recall it being a thing for the lightweight implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2247011594 From aboldtch at openjdk.org Fri Aug 1 06:14:57 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 1 Aug 2025 06:14:57 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 13:17:37 GMT, Fredrik Bredberg wrote: > Since the integration of [JDK-8359437](https://bugs.openjdk.org/browse/JDK-8359437) the `LockingMode` flag can no longer be set by the user, instead it's declared as `const int LockingMode = LM_LIGHTWEIGHT;`. This means that we can now safely remove all `LockingMode` related code from all platforms. > > This PR removes `LockingMode` related code from the **x86** platform. > > When all the `LockingMode` code has been removed from all platforms, we can go on and remove it from shared (non-platform specific) files as well. And finally remove the `LockingMode` variable itself. > > Passes tier1-tier5 with no added problems. Nice cleanup! Some small initial comments. All the "displaced header" comments looks out of place. Displacing the header word on the stack (in the box) was purely a LM_LEGACY thing. Now we only displace it in the ObjectMonitor which is only handled (inflation / deflation) in the C++ runtime. There are some more `BasicLock::displaced_header_offset_in_bytes()` asserts inside the x86 code. For callers of these methods, could be removed now or when the `BasicLock` is cleaned up. There are some unused variables because of "displaced header" code that is kept. `fast_lock_lightweight` and `fast_unlock_lockweight` should probably be renamed `fast_lock` and `fast_unlock` to be in sync with all the comments. (Or all the comments should be updated) (Same with C2 AD instruction) src/hotspot/cpu/x86/interp_masm_x86.cpp line 1032: > 1030: const Register tmp_reg = rbx; > 1031: const Register obj_reg = c_rarg3; // Will contain the oop > 1032: const Register rklass_decode_tmp = rscratch1; Unused variable. src/hotspot/cpu/x86/interp_masm_x86.cpp line 1037: > 1035: const int lock_offset = in_bytes(BasicObjectLock::lock_offset()); > 1036: const int mark_offset = lock_offset + > 1037: BasicLock::displaced_header_offset_in_bytes(); Unused variable. src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 2194: > 2192: > 2193: // Load the oop from the handle > 2194: __ movptr(obj_reg, Address(oop_handle_reg, 0)); `mark_word_offset` and `count_mon` unused variable above. ------------- Changes requested by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26552#pullrequestreview-3077803768 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2247015924 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2247016992 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2247033438 From xgong at openjdk.org Fri Aug 1 06:36:12 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 1 Aug 2025 06:36:12 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v4] In-Reply-To: References: Message-ID: <19LUInq9pUl59aETNo6Yln_Y0hLDV5L3q7X-YWHwt8Q=.c1780f71-85cc-4a9e-8ff5-6211216692d6@github.com> > This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform. > > ### Background > Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register. > > ### Implementation > > #### Challenges > Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints. > > For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches: > - SPECIES_64: Single operation with mask (8 elements, 256-bit) > - SPECIES_128: Single operation, full register (16 elements, 512-bit) > - SPECIES_256: Two operations + merge (32 elements, 1024-bit) > - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit) > > Use `ByteVector.SPECIES_512` as an example: > - It contains 64 elements. So the index vector size should be `64 * 32` bits, which is 4 times of the SVE vector register size. > - It requires 4 times of vector gather-loads to finish the whole operation. > > > byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...] > int[] idx = [0, 1, 2, 3, ..., 63, ...] > > 4 gather-load: > idx_v1 = [15 14 13 ... 1 0] gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa] > idx_v2 = [31 30 29 ... 17 16] gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb] > idx_v3 = [47 46 45 ... 33 32] gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc] > idx_v4 = [63 62 61 ... 49 48] gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd] > merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa] > > > #### Solution > The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end. > > Here is the main changes: > - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher. > - Added `VectorSliceNode` for result merging. > - Added `VectorMaskWidenNode` for mask spliting and type conversion fo... Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26236/files - new: https://git.openjdk.org/jdk/pull/26236/files/be63ade6..71f13003 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26236&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26236&range=02-03 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/26236.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26236/head:pull/26236 PR: https://git.openjdk.org/jdk/pull/26236 From shade at openjdk.org Fri Aug 1 07:05:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 1 Aug 2025 07:05:53 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 16:20:43 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. > > Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. > > Additional testing: > - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Thanks! @TobiHartmann, are you good with this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26559#issuecomment-3143108182 From bkilambi at openjdk.org Fri Aug 1 08:03:02 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 1 Aug 2025 08:03:02 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v18] In-Reply-To: <7tKfqCZHB1fAcrN7hU2mVZBrAfE2XkMUa5M-fG2dERc=.9a40f17a-3860-4c7b-bc22-73480865276f@github.com> References: <7tKfqCZHB1fAcrN7hU2mVZBrAfE2XkMUa5M-fG2dERc=.9a40f17a-3860-4c7b-bc22-73480865276f@github.com> Message-ID: <_p9WvCigbbbeeByjoEATEF0LZ5NiAmH7piPoNX5qwG8=.4eeea974-7cf7-4676-a9dc-3b3c70c008d6@github.com> On Fri, 25 Jul 2025 09:25:58 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Refine comments in the ad file > > OK, that looks like a good job. You'll need another reviewer. Hi @theRealAph Would it be ok for me to integrate this patch now? Two reviewers have approved, however if you feel there needs to be another aarch64 specific reviewer please let me know. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23570#issuecomment-3143633109 From bkilambi at openjdk.org Fri Aug 1 09:41:48 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 1 Aug 2025 09:41:48 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE Message-ID: After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - public void vectorAddConstInputFloat16() { for (int i = 0; i < LEN; ++i) { output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); } } The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. ------------- Commit messages: - 8361582: AArch64: Some ConH values cannot be replicated with SVE Changes: https://git.openjdk.org/jdk/pull/26589/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361582 Stats: 194 lines in 7 files changed: 170 ins; 4 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/26589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26589/head:pull/26589 PR: https://git.openjdk.org/jdk/pull/26589 From aph at openjdk.org Fri Aug 1 09:53:54 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 1 Aug 2025 09:53:54 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE In-Reply-To: References: Message-ID: On Fri, 1 Aug 2025 09:31:40 GMT, Bhavana Kilambi wrote: > After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - > `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - > > > public void vectorAddConstInputFloat16() { > for (int i = 0; i < LEN; ++i) { > output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); > } > } > > > > > > The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. > > This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). > > Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. src/hotspot/cpu/aarch64/aarch64_vector.ad line 4903: > 4901: > 4902: // Replicate a 16-bit half precision float which is within the limits > 4903: // as specified for the operand - immH8_shift8 Suggestion: // for the operand - immH8_shift8 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2247507578 From bkilambi at openjdk.org Fri Aug 1 10:41:54 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 1 Aug 2025 10:41:54 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F In-Reply-To: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: On Thu, 24 Jul 2025 10:29:15 GMT, Galder Zamarre?o wrote: > I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. > > Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: > > > Benchmark (seed) (size) Mode Cnt Base Patch Units Diff > VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% > VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% > VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% > VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% > VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% > VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% > > > The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. > > I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. test/micro/org/openjdk/bench/java/lang/VectorBitConversion.java line 3: > 1: package org.openjdk.bench.java.lang; > 2: > 3: import org.openjdk.jmh.annotations.Benchmark; This file might also need a Copyright? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2247638120 From snatarajan at openjdk.org Fri Aug 1 11:30:13 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Fri, 1 Aug 2025 11:30:13 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v2] In-Reply-To: References: Message-ID: > The existing test (`compiler/debug/TestStress.java`) verifies that compiler stress options produce consistent traces when using the same seed. However, there is currently no test to ensure that different seeds result in different traces. > > ### Solution > Added a test case to assess the distinctness of traces generated from different seeds. This fix addresses the fragility concern highlighted in [JDK-8325482](https://bugs.openjdk.org/browse/JDK-8325482) by verifying that traces produced using N (in this case 10) distinct seeds are all not identical. > > ### Changes to `compiler/debug/TestStress.java` > While investigating this issue, I observed that in `compiler/debug/TestStress.java`, the stress options for macro expansion and macro elimination were not being triggered because there were fewer than 2 macro nodes. Note that the `shuffle_macro_nodes()` in` compile.cpp` is only meaningful when there are more than two macro nodes. The generated traces for macro expansion and macro elimination in `TestStress.java` were empty. I have proposed changes to address this problem. Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: - changing N to 5 - Adding test for same seed --> same result for N = 10 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26554/files - new: https://git.openjdk.org/jdk/pull/26554/files/513ab6d3..14617e01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26554&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26554&range=00-01 Stats: 29 lines in 1 file changed: 16 ins; 1 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/26554.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26554/head:pull/26554 PR: https://git.openjdk.org/jdk/pull/26554 From snatarajan at openjdk.org Fri Aug 1 11:36:58 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Fri, 1 Aug 2025 11:36:58 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v2] In-Reply-To: References: Message-ID: On Thu, 31 Jul 2025 07:32:28 GMT, Christian Hagedorn wrote: >> Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: >> >> - changing N to 5 >> - Adding test for same seed --> same result for N = 10 > > test/hotspot/jtreg/compiler/debug/TestStressDistinctSeed.java line 102: > >> 100: ccpTraceSet.add(ccpTrace(s)); >> 101: macroExpansionTraceSet.add(macroExpansionTrace(s)); >> 102: macroEliminationTraceSet.add(macroEliminationTrace(s)); > > A suggestion, do you also want to check here that two runs with the same seed produce the same result to show that different seeds really produce different results due to the seed and not just some indeterminism with the test itself? How long does your test need now and afterwards with a fastdebug build? Maybe we can also lower the number of seeds if it takes too long or only do the equality-test for a single seed. Thank you for the review. This is very good point. I agree with you regarding checking that the same seed produces same traces. I implemented and tested what you suggested. Below are some numbers that I obtained from running the test with `jtreg -vt ` commit #26554 513ab6d322540aaaf5a167cebb30b87736f7cd91 [with no check for same seed -> same trace ] slowdebug build: 7.205 seconds driver: 32.111 seconds fastdebug build: 0.002 seconds driver: 9.102 seconds commit #26554 7eff4d55024db36b811e4304cf706354e25c8200 [with check for same seed -> same trace and N = 10 ] slowdebug ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26554#discussion_r2247756761 From shade at openjdk.org Fri Aug 1 11:55:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 1 Aug 2025 11:55:55 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE In-Reply-To: References: Message-ID: On Fri, 1 Aug 2025 09:31:40 GMT, Bhavana Kilambi wrote: > After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - > `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - > > > public void vectorAddConstInputFloat16() { > for (int i = 0; i < LEN; ++i) { > output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); > } > } > > > > > > The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. > > This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). > > Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. Thank you for taking care of this. I am still a bit confused what matches `Replicate` with `immH` that does *not* fit `immH8_shift8` when `Matcher::vector_length_in_bytes(n) > 16`? src/hotspot/cpu/aarch64/aarch64.ad line 4377: > 4375: operand immI8_shift8() > 4376: %{ > 4377: predicate(Assembler::operand_valid_for_sve_dup_immediate((int64_t)n->get_int())); `Assembler::operand_valid_for_sve_dup_immediate` sounds odd as the predicate for a generically sounding `immI8_shift8`. These operands are only used in `replicate` rules, though. So we might be taking precedent from immIAddSubV` rule: // 32 bit integer valid for vector add sub immediate operand immIAddSubV() %{ predicate(Assembler::operand_valid_for_sve_add_sub_immediate((int64_t)n->get_int())); match(ConI); op_cost(0); format %{ %} interface(CONST_INTER); %} I.e. rename these operands to `immIDupV`, `immLDupV`, `immHDupV` and adjust the comments to match? ------------- PR Review: https://git.openjdk.org/jdk/pull/26589#pullrequestreview-3078925906 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2247783609 From qamai at openjdk.org Fri Aug 1 12:00:53 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 1 Aug 2025 12:00:53 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F In-Reply-To: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: <7UqSdBPWH0SbdkhAUvF_qM10rK0oFsJXhUKWA3VlL14=.0c35e297-7276-468b-98c6-046e84897625@github.com> On Thu, 24 Jul 2025 10:29:15 GMT, Galder Zamarre?o wrote: > I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. > > Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: > > > Benchmark (seed) (size) Mode Cnt Base Patch Units Diff > VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% > VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% > VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% > VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% > VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% > VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% > > > The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. > > I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. `VectorNode::is_reinterpret_opcode` returns `true` for `Op_ReinterpretHF2S` and `Op_ReinterpretS2HF`, which are very similar to the nodes in this PR, can you add these nodes to that method instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3144328104 From bkilambi at openjdk.org Fri Aug 1 12:07:54 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 1 Aug 2025 12:07:54 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE In-Reply-To: References: Message-ID: <0jcw428unzAfdGcqci79xBRxjw3yHN_MxYc7OOuHDz8=.31bd3357-49ff-442f-8d06-58447df49de7@github.com> On Fri, 1 Aug 2025 09:31:40 GMT, Bhavana Kilambi wrote: > After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - > `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - > > > public void vectorAddConstInputFloat16() { > for (int i = 0; i < LEN; ++i) { > output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); > } > } > > > > > > The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. > > This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). > > Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > I am still a bit confused what matches `Replicate` with `immH` that does _not_ fit `immH8_shift8` when `Matcher::vector_length_in_bytes(n) > 16`? Hi, thanks for your review. If the immediate value does not fit `immH8_shift8` for `Matcher::vector_length_in_bytes(n) > 16` , the compiler would generate `loadConH` [1] -> `replicateHF` [2] backend nodes instead. The constant would be loaded from the constant pool instead and then broadcasted/replicated to every lane of an SVE register. [1] https://github.com/openjdk/jdk/blob/8ac4a88f3c5ad57824dd192cb3f0af5e71cbceeb/src/hotspot/cpu/aarch64/aarch64.ad#L6963 [2] https://github.com/openjdk/jdk/blob/8ac4a88f3c5ad57824dd192cb3f0af5e71cbceeb/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4806 ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3144344435 From bkilambi at openjdk.org Fri Aug 1 12:19:54 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 1 Aug 2025 12:19:54 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F In-Reply-To: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: On Thu, 24 Jul 2025 10:29:15 GMT, Galder Zamarre?o wrote: > I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. > > Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: > > > Benchmark (seed) (size) Mode Cnt Base Patch Units Diff > VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% > VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% > VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% > VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% > VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% > VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% > > > The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. > > I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. src/hotspot/share/opto/vectornode.cpp line 1830: > 1828: } > 1829: > 1830: bool VectorReinterpretNode::implemented(int opc, uint vlen, BasicType src_type, BasicType dst_type) { `opc` is not used in this method. Do we need this parameter here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2247834249 From fbredberg at openjdk.org Fri Aug 1 12:28:56 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 1 Aug 2025 12:28:56 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 In-Reply-To: References: Message-ID: On Fri, 1 Aug 2025 05:59:59 GMT, Axel Boldt-Christmas wrote: >> Since the integration of [JDK-8359437](https://bugs.openjdk.org/browse/JDK-8359437) the `LockingMode` flag can no longer be set by the user, instead it's declared as `const int LockingMode = LM_LIGHTWEIGHT;`. This means that we can now safely remove all `LockingMode` related code from all platforms. >> >> This PR removes `LockingMode` related code from the **x86** platform. >> >> When all the `LockingMode` code has been removed from all platforms, we can go on and remove it from shared (non-platform specific) files as well. And finally remove the `LockingMode` variable itself. >> >> Passes tier1-tier5 with no added problems. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 227: > >> 225: void C2_MacroAssembler::fast_lock_lightweight(Register obj, Register box, Register rax_reg, >> 226: Register t, Register thread) { >> 227: assert(box == rbx, "Used for displaced header location"); > > Where does this RBX requirement come from? Do not recall it being a thing for the lightweight implementation. Good question! Long story short: `C2_MacroAssembler::fast_unlock` used to have this comment: `// box: box address (displaced header location), killed. Must be EAX. ` Looking at `cmpFastLock` in the `x86_64.ad` file I saw that the box was indeed hardwired to rax (e.g. `rax_RegP box`). Since there was no special comment above `fast_lock_lightweight` I though that I should reuse the one from the deleted `fast_lock`. Then when looking in `cmpFastLockLightweight` I saw that box was hardwired to rbx. And ta-da, there you have the reason. But now since you asked about it, I understand that the reason it say "Must be EAX" in `fast_lock`, was because it was used in a `cmpxchgptr`. But this was only used in the legacy locking mode, which is now deleted. So I will delete the comment and the assert(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2247851859 From fbredberg at openjdk.org Fri Aug 1 12:32:57 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 1 Aug 2025 12:32:57 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 16:22:14 GMT, Coleen Phillimore wrote: >> Since the integration of [JDK-8359437](https://bugs.openjdk.org/browse/JDK-8359437) the `LockingMode` flag can no longer be set by the user, instead it's declared as `const int LockingMode = LM_LIGHTWEIGHT;`. This means that we can now safely remove all `LockingMode` related code from all platforms. >> >> This PR removes `LockingMode` related code from the **x86** platform. >> >> When all the `LockingMode` code has been removed from all platforms, we can go on and remove it from shared (non-platform specific) files as well. And finally remove the `LockingMode` variable itself. >> >> Passes tier1-tier5 with no added problems. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 389: > >> 387: // obj: object to lock >> 388: // rax: tmp -- KILLED >> 389: // t : tmp - cannot be obj nor rax -- KILLED > > This same comment is repeated just above so you probably don't need it here. Since it's more than 150 lives above, I'd rather keep this "copy" here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2247860770 From bkilambi at openjdk.org Fri Aug 1 12:44:56 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 1 Aug 2025 12:44:56 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F In-Reply-To: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: On Thu, 24 Jul 2025 10:29:15 GMT, Galder Zamarre?o wrote: > I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. > > Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: > > > Benchmark (seed) (size) Mode Cnt Base Patch Units Diff > VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% > VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% > VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% > VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% > VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% > VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% > > > The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. > > I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. test/micro/org/openjdk/bench/java/lang/VectorBitConversion.java line 67: > 65: > 66: @Benchmark > 67: public long[] doubleToLongBits() { Would something like this be more concise (and maybe more readable as well) - @Benchmark public long[] doubleToLongBits() { for (int i = 0; i < doubles.length; i++) { resultLongs[i] = Double.doubleToLongBits(doubles[i]); } return resultLongs; } The loop should still get vectorized (if vectorizable). Same for other benchmarks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2247880010 From aph at openjdk.org Fri Aug 1 12:47:00 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 1 Aug 2025 12:47:00 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE In-Reply-To: References: Message-ID: On Fri, 1 Aug 2025 09:31:40 GMT, Bhavana Kilambi wrote: > After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - > `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - > > > public void vectorAddConstInputFloat16() { > for (int i = 0; i < LEN; ++i) { > output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); > } > } > > > > > > The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. > > This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). > > Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 439: > 437: bool Assembler::operand_valid_for_sve_dup_immediate(int64_t imm) { > 438: return ((imm <= 127 && imm >= -128) || > 439: (imm <= 32767 && imm >= -32768 && (imm & 0xff) == 0)); Suggestion: return ((imm >= -128 && imm <= 127) || (imm & 0xff == 0) && (imm >= -32768 && imm <= 32767)); Reason: it's more conventional, and closer to the mathematical _l ? x ? h_. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2247885761 From bkilambi at openjdk.org Fri Aug 1 12:48:55 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 1 Aug 2025 12:48:55 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F In-Reply-To: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: On Thu, 24 Jul 2025 10:29:15 GMT, Galder Zamarre?o wrote: > I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. > > Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: > > > Benchmark (seed) (size) Mode Cnt Base Patch Units Diff > VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% > VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% > VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% > VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% > VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% > VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% > > > The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. > > I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. Although this is not in the scope of this patch, but I wonder if we could rename `ReinterpretS2HF` and `ReinterpretHF2S` to `MoveHF2S` and `MoveS2HF` to keep naming consistent with other types? WDYT @jatin-bhateja ------------- PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3144464995 From aph at openjdk.org Fri Aug 1 12:53:02 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 1 Aug 2025 12:53:02 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v18] In-Reply-To: References: Message-ID: On Fri, 25 Jul 2025 09:17:19 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Refine comments in the ad file Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23570#pullrequestreview-3079106641 From bkilambi at openjdk.org Fri Aug 1 12:54:54 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 1 Aug 2025 12:54:54 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F In-Reply-To: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: On Thu, 24 Jul 2025 10:29:15 GMT, Galder Zamarre?o wrote: > I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. > > Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: > > > Benchmark (seed) (size) Mode Cnt Base Patch Units Diff > VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% > VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% > VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% > VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% > VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% > VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% > > > The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. > > I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. src/hotspot/share/opto/vectornode.cpp line 1831: > 1829: > 1830: bool VectorReinterpretNode::implemented(int opc, uint vlen, BasicType src_type, BasicType dst_type) { > 1831: if ((src_type == T_FLOAT && dst_type == T_INT) || Just a suggestion, do you feel a `switch-case` could be more readable/clear in this case? Something like this - bool VectorReinterpretNode::implemented(uint vlen, BasicType src_type, BasicType dst_type) { switch (src_type) { case T_FLOAT: if (dst_type != T_INT) return false; break; case T_INT: if (dst_type != T_FLOAT) return false; break; case T_DOUBLE: if (dst_type != T_LONG) return false; break; case T_LONG: if (dst_type != T_DOUBLE) return false; break; default: return false; } return Matcher::match_rule_supported_auto_vectorization(Op_VectorReinterpret, vlen, dst_type); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2247906630 From duke at openjdk.org Fri Aug 1 13:09:02 2025 From: duke at openjdk.org (duke) Date: Fri, 1 Aug 2025 13:09:02 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v18] In-Reply-To: References: Message-ID: On Fri, 25 Jul 2025 09:17:19 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Refine comments in the ad file @Bhavana-Kilambi Your change (at version 3675bf34b29121b5265bf53f2257738cb4ee591e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23570#issuecomment-3144526561 From bkilambi at openjdk.org Fri Aug 1 13:14:11 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 1 Aug 2025 13:14:11 GMT Subject: Integrated: 8348868: AArch64: Add backend support for SelectFromTwoVector In-Reply-To: References: Message-ID: <1wnNPeEQWjd1Jn3ngWqIojD_w-AcvnQb4kOz3WzGesk=.f780e901-083f-40af-b20f-5801b6b3eb3a@github.com> On Tue, 11 Feb 2025 20:20:54 GMT, Bhavana Kilambi wrote: > This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. > > It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. > > For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. > > For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. > > This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. > > Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - > > > Benchmark (size) Mode Cnt Gain > SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 > SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 > SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 > SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 > SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 > SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 > SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 > SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 > SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 > SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 > > > Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. This pull request has now been integrated. Changeset: 2ba8a06f Author: Bhavana Kilambi Committer: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/2ba8a06f0c0a598a6ca7f74e75bab4208e6fa689 Stats: 973 lines in 13 files changed: 947 ins; 0 del; 26 mod 8348868: AArch64: Add backend support for SelectFromTwoVector Co-authored-by: Jatin Bhateja Reviewed-by: haosun, aph, sviswanathan, xgong ------------- PR: https://git.openjdk.org/jdk/pull/23570 From shade at openjdk.org Fri Aug 1 14:31:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 1 Aug 2025 14:31:56 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE In-Reply-To: <0jcw428unzAfdGcqci79xBRxjw3yHN_MxYc7OOuHDz8=.31bd3357-49ff-442f-8d06-58447df49de7@github.com> References: <0jcw428unzAfdGcqci79xBRxjw3yHN_MxYc7OOuHDz8=.31bd3357-49ff-442f-8d06-58447df49de7@github.com> Message-ID: On Fri, 1 Aug 2025 12:04:35 GMT, Bhavana Kilambi wrote: > If the immediate value does not fit `immH8_shift8` for `Matcher::vector_length_in_bytes(n) > 16` , the compiler would generate `loadConH` [1] -> `replicateHF` [2] backend nodes instead. Ah OK, just checking. I ran this patch on the machine where I have originally found the issue, and it seems to work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3144786986 From mdoerr at openjdk.org Fri Aug 1 18:36:56 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 1 Aug 2025 18:36:56 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v2] In-Reply-To: References: Message-ID: On Thu, 24 Jul 2025 18:51:22 GMT, Dean Long wrote: >> This PR removes the recently added lock around set_guard_value, using instead Atomic::cmpxchg to atomically update bit-fields of the guard value. Further, it takes a fast-path that uses the previous direct store when at a safepoint. Combined, these changes should get us back to almost where we were before in terms of overhead. If necessary, we could go even further and allow make_not_entrant() to perform a direct byte store, leaving 24 bits for the guard value. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > remove NMethodEntryBarrier_lock Thanks for implementing it for PPC64! The instruction sequence needs to be modified (see below). I'd like to have https://github.com/TheRealMDoerr/jdk/commit/522b1ef2e75509d91ac18a1acd27275fc0305e8e, too. Should I file a separate RFE for that? src/hotspot/cpu/ppc/gc/shared/barrierSetAssembler_ppc.cpp line 195: > 193: // This is a compound instruction. Patching support is provided by NativeMovRegMem. > 194: // Actual patching is done in (platform-specific part of) BarrierSetNMethod. > 195: __ align(8); // align for atomic update We can't do this within this fixed size instruction sequence. But, it can be fixed like this: https://github.com/TheRealMDoerr/jdk/commit/a06b34468f5cb063892f92b66d058a8f444f05a1 ------------- Changes requested by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26399#pullrequestreview-3080187466 PR Review Comment: https://git.openjdk.org/jdk/pull/26399#discussion_r2248603769 From mdoerr at openjdk.org Fri Aug 1 18:46:55 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 1 Aug 2025 18:46:55 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v2] In-Reply-To: References: Message-ID: On Thu, 24 Jul 2025 18:51:22 GMT, Dean Long wrote: >> This PR removes the recently added lock around set_guard_value, using instead Atomic::cmpxchg to atomically update bit-fields of the guard value. Further, it takes a fast-path that uses the previous direct store when at a safepoint. Combined, these changes should get us back to almost where we were before in terms of overhead. If necessary, we could go even further and allow make_not_entrant() to perform a direct byte store, leaving 24 bits for the guard value. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > remove NMethodEntryBarrier_lock src/hotspot/share/gc/shared/barrierSetNMethod.cpp line 113: > 111: } > 112: > 113: MACOS_AARCH64_ONLY(ThreadWXEnable wx(WXWrite, Thread::current())); This looks also ok as alternative to my proposal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26399#discussion_r2248631958 From dlong at openjdk.org Fri Aug 1 20:09:11 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 1 Aug 2025 20:09:11 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v3] In-Reply-To: References: Message-ID: <_8Z-PXaFqGay1pHqcJmeWXrOFv4QQVqnJG2RuZ7rzTk=.34cc6ecb-e189-461c-971b-f59f899372f5@github.com> > This PR removes the recently added lock around set_guard_value, using instead Atomic::cmpxchg to atomically update bit-fields of the guard value. Further, it takes a fast-path that uses the previous direct store when at a safepoint. Combined, these changes should get us back to almost where we were before in terms of overhead. If necessary, we could go even further and allow make_not_entrant() to perform a direct byte store, leaving 24 bits for the guard value. Dean Long has updated the pull request incrementally with one additional commit since the last revision: Fix PPC64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26399/files - new: https://git.openjdk.org/jdk/pull/26399/files/e05605eb..a06b3446 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26399&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26399&range=01-02 Stats: 24 lines in 2 files changed: 11 ins; 10 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26399.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26399/head:pull/26399 PR: https://git.openjdk.org/jdk/pull/26399 From mdoerr at openjdk.org Fri Aug 1 20:09:11 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 1 Aug 2025 20:09:11 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v3] In-Reply-To: <_8Z-PXaFqGay1pHqcJmeWXrOFv4QQVqnJG2RuZ7rzTk=.34cc6ecb-e189-461c-971b-f59f899372f5@github.com> References: <_8Z-PXaFqGay1pHqcJmeWXrOFv4QQVqnJG2RuZ7rzTk=.34cc6ecb-e189-461c-971b-f59f899372f5@github.com> Message-ID: On Fri, 1 Aug 2025 20:05:56 GMT, Dean Long wrote: >> This PR removes the recently added lock around set_guard_value, using instead Atomic::cmpxchg to atomically update bit-fields of the guard value. Further, it takes a fast-path that uses the previous direct store when at a safepoint. Combined, these changes should get us back to almost where we were before in terms of overhead. If necessary, we could go even further and allow make_not_entrant() to perform a direct byte store, leaving 24 bits for the guard value. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > Fix PPC64 Thanks for integrating my fix! ------------- PR Review: https://git.openjdk.org/jdk/pull/26399#pullrequestreview-3080433000 From mdoerr at openjdk.org Fri Aug 1 21:09:53 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 1 Aug 2025 21:09:53 GMT Subject: RFR: 8361536: [s390x] Saving return_pc at wrong offset In-Reply-To: References: Message-ID: On Wed, 9 Jul 2025 05:24:38 GMT, Amit Kumar wrote: > Fixes the bug where return pc was stored at a wrong offset, which causes issue with java abi. > > Issue appeared in #26004, see the comment: https://github.com/openjdk/jdk/pull/26004#issuecomment-3017928879. I did not request reverting it. I only corrected the wrong description. You can use your favorite offsets :-) ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26209#pullrequestreview-3080567934 From mdoerr at openjdk.org Fri Aug 1 21:58:56 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 1 Aug 2025 21:58:56 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v3] In-Reply-To: <_8Z-PXaFqGay1pHqcJmeWXrOFv4QQVqnJG2RuZ7rzTk=.34cc6ecb-e189-461c-971b-f59f899372f5@github.com> References: <_8Z-PXaFqGay1pHqcJmeWXrOFv4QQVqnJG2RuZ7rzTk=.34cc6ecb-e189-461c-971b-f59f899372f5@github.com> Message-ID: On Fri, 1 Aug 2025 20:09:11 GMT, Dean Long wrote: >> This PR removes the recently added lock around set_guard_value, using instead Atomic::cmpxchg to atomically update bit-fields of the guard value. Further, it takes a fast-path that uses the previous direct store when at a safepoint. Combined, these changes should get us back to almost where we were before in terms of overhead. If necessary, we could go even further and allow make_not_entrant() to perform a direct byte store, leaving 24 bits for the guard value. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > Fix PPC64 PPC64 code looks correct, now, but I have minor proposals. src/hotspot/cpu/ppc/gc/shared/barrierSetNMethod_ppc.cpp line 84: > 82: nativeMovRegMem_at(new_mov_instr.buf)->set_offset(new_value, false /* no icache flush */); > 83: // Swap in the new value > 84: uint64_t v = Atomic::cmpxchg(instr, old_mov_instr.u64, new_mov_instr.u64, memory_order_release); We have `OrderAccess::release()` above, so `memory_order_release` looks redundant. Shouldn't we use `memory_order_relaxed`, here? src/hotspot/cpu/ppc/gc/shared/barrierSetNMethod_ppc.cpp line 88: > 86: old_mov_instr.u64 = v; > 87: } > 88: ICache::ppc64_flush_icache_bytes(addr_at(0), NativeMovRegMem::instruction_size); Maybe only use flushing if `cmpxchg` succeeded? Otherwise, we didn't modify the code. ------------- PR Review: https://git.openjdk.org/jdk/pull/26399#pullrequestreview-3080627303 PR Review Comment: https://git.openjdk.org/jdk/pull/26399#discussion_r2248909656 PR Review Comment: https://git.openjdk.org/jdk/pull/26399#discussion_r2248925985 From jbhateja at openjdk.org Sat Aug 2 01:31:04 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 2 Aug 2025 01:31:04 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v8] In-Reply-To: <-otlKVhe_xfmpET_cwn5CdvzDduOfFApGSH5VoZSwuk=.7eb8a0e3-4ad6-4ffb-97fd-11a2120a3eaf@github.com> References: <-otlKVhe_xfmpET_cwn5CdvzDduOfFApGSH5VoZSwuk=.7eb8a0e3-4ad6-4ffb-97fd-11a2120a3eaf@github.com> Message-ID: On Wed, 30 Jul 2025 06:14:40 GMT, erifan wrote: >> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of the `maskAll` is >> relative smaller than that of `fromLong`. So this patch does the conversion for these cases. >> >> The conversion is done in C2's IGVN phase. And on platforms (like Arm NEON) that don't support `VectorLongToMask`, the conversion is done during intrinsiication process if `MaskAll` or `Replicate` is supported. >> >> Since this optimization requires the input long value of `VectorMask.fromLong` to be specific compile-time constants, and such expressions are usually hoisted out of the loop. So we can't see noticeable performance change. >> >> This conversion also enables further optimizations that recognize maskAll patterns, see [1]. And we can observe a performance improvement of about 7% on both aarch64 and x64. >> >> As `VectorLongToMask` is converted to `MaskAll` or `Replicate`, some existing optimizations recognizing the `VectorLongToMask` will be affected, like >> >> VectorMaskToLong (VectorLongToMask x) => x >> >> >> Hence, this patch also added the following optimizations: >> >> VectorMaskToLong (MaskAll x) => (x & (-1ULL >> (64 - vlen))) // x is -1 or 0 >> VectorMaskToLong (VectorStoreMask (Replicate x)) => (x & (-1ULL >> (64 - vlen))) // x is -1 or 0 >> >> VectorMaskCast (VectorMaskCast x) => x >> >> And we can see noticeable performance improvement with the above optimizations for floating-point types. >> >> Benchmarks on Nvidia Grace machine with option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Error After Error Uplift >> microMaskFromLongToLong_Double128 ops/s 1522384.986 1324881.46 2835774480 403575069.7 1862.71 >> microMaskFromLongToLong_Double256 ops/s 4275.415598 28.560622 4285.587451 27.633101 1 >> microMaskFromLongToLong_Double512 ops/s 3702.171936 9.528497 3692.747579 18.47744 0.99 >> microMaskFromLongToLong_Double64 ops/s 4624.452243 37.388427 4616.320519 23.455954 0.99 >> microMaskFromLongToLong_Float128 ops/s 1239661.887 1286803.852 2842927993 360468218.3 2293.3 >> microMaskFromLongToLong_Float256 ops/s 3681.64954 15.153633 3685.411771 21.737124 1 >> microMaskFromLongToLong_Float512 ops/s 3007.563025 10.189944 3022.002986 14.137287 1 >> microMaskFromLongToLong_Float64 ops/s 1646664.258 1375451.279 2948453900 397472562.4 1790.56 >> >> >> Benchmarks on AMD EPYC 9124 16-Core Processor with option `-XX:UseAVX=3`: >> >> Benchm... > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Set default warm up to 10000 for JTReg tests LGTM Best Regards ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25793#pullrequestreview-3080833264 From duke at openjdk.org Sat Aug 2 01:39:04 2025 From: duke at openjdk.org (duke) Date: Sat, 2 Aug 2025 01:39:04 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v8] In-Reply-To: <-otlKVhe_xfmpET_cwn5CdvzDduOfFApGSH5VoZSwuk=.7eb8a0e3-4ad6-4ffb-97fd-11a2120a3eaf@github.com> References: <-otlKVhe_xfmpET_cwn5CdvzDduOfFApGSH5VoZSwuk=.7eb8a0e3-4ad6-4ffb-97fd-11a2120a3eaf@github.com> Message-ID: On Wed, 30 Jul 2025 06:14:40 GMT, erifan wrote: >> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of the `maskAll` is >> relative smaller than that of `fromLong`. So this patch does the conversion for these cases. >> >> The conversion is done in C2's IGVN phase. And on platforms (like Arm NEON) that don't support `VectorLongToMask`, the conversion is done during intrinsiication process if `MaskAll` or `Replicate` is supported. >> >> Since this optimization requires the input long value of `VectorMask.fromLong` to be specific compile-time constants, and such expressions are usually hoisted out of the loop. So we can't see noticeable performance change. >> >> This conversion also enables further optimizations that recognize maskAll patterns, see [1]. And we can observe a performance improvement of about 7% on both aarch64 and x64. >> >> As `VectorLongToMask` is converted to `MaskAll` or `Replicate`, some existing optimizations recognizing the `VectorLongToMask` will be affected, like >> >> VectorMaskToLong (VectorLongToMask x) => x >> >> >> Hence, this patch also added the following optimizations: >> >> VectorMaskToLong (MaskAll x) => (x & (-1ULL >> (64 - vlen))) // x is -1 or 0 >> VectorMaskToLong (VectorStoreMask (Replicate x)) => (x & (-1ULL >> (64 - vlen))) // x is -1 or 0 >> >> VectorMaskCast (VectorMaskCast x) => x >> >> And we can see noticeable performance improvement with the above optimizations for floating-point types. >> >> Benchmarks on Nvidia Grace machine with option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Error After Error Uplift >> microMaskFromLongToLong_Double128 ops/s 1522384.986 1324881.46 2835774480 403575069.7 1862.71 >> microMaskFromLongToLong_Double256 ops/s 4275.415598 28.560622 4285.587451 27.633101 1 >> microMaskFromLongToLong_Double512 ops/s 3702.171936 9.528497 3692.747579 18.47744 0.99 >> microMaskFromLongToLong_Double64 ops/s 4624.452243 37.388427 4616.320519 23.455954 0.99 >> microMaskFromLongToLong_Float128 ops/s 1239661.887 1286803.852 2842927993 360468218.3 2293.3 >> microMaskFromLongToLong_Float256 ops/s 3681.64954 15.153633 3685.411771 21.737124 1 >> microMaskFromLongToLong_Float512 ops/s 3007.563025 10.189944 3022.002986 14.137287 1 >> microMaskFromLongToLong_Float64 ops/s 1646664.258 1375451.279 2948453900 397472562.4 1790.56 >> >> >> Benchmarks on AMD EPYC 9124 16-Core Processor with option `-XX:UseAVX=3`: >> >> Benchm... > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Set default warm up to 10000 for JTReg tests @erifan Your change (at version b1a768ebc3c28002a0daa4dd7bfb9573c958a9f0) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25793#issuecomment-3146101282 From duke at openjdk.org Sat Aug 2 07:58:07 2025 From: duke at openjdk.org (erifan) Date: Sat, 2 Aug 2025 07:58:07 GMT Subject: Integrated: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 08:33:09 GMT, erifan wrote: > If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of the `maskAll` is > relative smaller than that of `fromLong`. So this patch does the conversion for these cases. > > The conversion is done in C2's IGVN phase. And on platforms (like Arm NEON) that don't support `VectorLongToMask`, the conversion is done during intrinsiication process if `MaskAll` or `Replicate` is supported. > > Since this optimization requires the input long value of `VectorMask.fromLong` to be specific compile-time constants, and such expressions are usually hoisted out of the loop. So we can't see noticeable performance change. > > This conversion also enables further optimizations that recognize maskAll patterns, see [1]. And we can observe a performance improvement of about 7% on both aarch64 and x64. > > As `VectorLongToMask` is converted to `MaskAll` or `Replicate`, some existing optimizations recognizing the `VectorLongToMask` will be affected, like > > VectorMaskToLong (VectorLongToMask x) => x > > > Hence, this patch also added the following optimizations: > > VectorMaskToLong (MaskAll x) => (x & (-1ULL >> (64 - vlen))) // x is -1 or 0 > VectorMaskToLong (VectorStoreMask (Replicate x)) => (x & (-1ULL >> (64 - vlen))) // x is -1 or 0 > > VectorMaskCast (VectorMaskCast x) => x > > And we can see noticeable performance improvement with the above optimizations for floating-point types. > > Benchmarks on Nvidia Grace machine with option `-XX:UseSVE=2`: > > Benchmark Unit Before Error After Error Uplift > microMaskFromLongToLong_Double128 ops/s 1522384.986 1324881.46 2835774480 403575069.7 1862.71 > microMaskFromLongToLong_Double256 ops/s 4275.415598 28.560622 4285.587451 27.633101 1 > microMaskFromLongToLong_Double512 ops/s 3702.171936 9.528497 3692.747579 18.47744 0.99 > microMaskFromLongToLong_Double64 ops/s 4624.452243 37.388427 4616.320519 23.455954 0.99 > microMaskFromLongToLong_Float128 ops/s 1239661.887 1286803.852 2842927993 360468218.3 2293.3 > microMaskFromLongToLong_Float256 ops/s 3681.64954 15.153633 3685.411771 21.737124 1 > microMaskFromLongToLong_Float512 ops/s 3007.563025 10.189944 3022.002986 14.137287 1 > microMaskFromLongToLong_Float64 ops/s 1646664.258 1375451.279 2948453900 397472562.4 1790.56 > > > Benchmarks on AMD EPYC 9124 16-Core Processor with option `-XX:UseAVX=3`: > > Benchmark Unit Before Error After Error Uplift > microMaskFromLongToLong_Double... This pull request has now been integrated. Changeset: f40381e4 Author: erfang Committer: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/f40381e41d1356f92546a21c0d24060f8606b9b3 Stats: 1093 lines in 9 files changed: 1080 ins; 0 del; 13 mod 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases Reviewed-by: xgong, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/25793 From duke at openjdk.org Sat Aug 2 15:50:49 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Sat, 2 Aug 2025 15:50:49 GMT Subject: RFR: 8360304: Redundant condition in LibraryCallKit::inline_vector_nary_operation Message-ID: The check for `sopc != 0` is not needed after JDK-8353786, the function would exit at L374 otherwise. Passes tier1. ------------- Commit messages: - remove redundant check Changes: https://git.openjdk.org/jdk/pull/26606/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26606&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360304 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26606.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26606/head:pull/26606 PR: https://git.openjdk.org/jdk/pull/26606 From epeter at openjdk.org Sun Aug 3 06:52:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 3 Aug 2025 06:52:49 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v2] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. > - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. > - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` > - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` > - For aliasing analysis (adjacency and overlap), the "regu... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 190 commits: - manual merge with master - fix include order - manual merge with master - rm multiversioning testing - more comments cleanu - comment cleanup - more descriptions / proof - improve comments - fix test and code - small comment addition - ... and 180 more: https://git.openjdk.org/jdk/compare/f40381e4...d7e856d8 ------------- Changes: https://git.openjdk.org/jdk/pull/24278/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=01 Stats: 5305 lines in 24 files changed: 5055 ins; 20 del; 230 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Sun Aug 3 06:52:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 3 Aug 2025 06:52:51 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v2] In-Reply-To: <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> Message-ID: On Mon, 28 Jul 2025 09:24:34 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 190 commits: >> >> - manual merge with master >> - fix include order >> - manual merge with master >> - rm multiversioning testing >> - more comments cleanu >> - comment cleanup >> - more descriptions / proof >> - improve comments >> - fix test and code >> - small comment addition >> - ... and 180 more: https://git.openjdk.org/jdk/compare/f40381e4...d7e856d8 > > src/hotspot/share/opto/mempointer.hpp line 411: > >> 409: // Both p and mp have a linear form for v in r: >> 410: // p(v) = p(lo) - lo * scale_v + iv * scale_v (Corrolary P) >> 411: // mp(v) = mp(lo) - lo * scale_v + iv * scale_v (Corrolary MP) > > Where does `iv`come from? Is `v==iv`? Nice catch! > src/hotspot/share/opto/mempointer.hpp line 444: > >> 442: // = summand_rest + scale_v * (v0 + stride_v) + con >> 443: // = summand_rest + scale_v * v0 + scale_v * stride_v * con >> 444: // = summand_rest + scale_v * v0 + scale_v * stride_v * con > > Suggestion: > > // = summand_rest + scale_v * v0 + scale_v * stride_v + con > // = summand_rest + scale_v * v0 + scale_v * stride_v + con > > These ought to be plusses. Oh dear, yes! > src/hotspot/share/opto/mempointer.hpp line 663: > >> 661: }; >> 662: >> 663: // The MemPointerSummand is designed to allow the simplification of > > Shouldn't this be `MemPointerRawSummand`? No. I'm explaining the `MemPointerRawSummand` further below. This section should explain the difference between `MemPointerRawSummand` and `MemPointerSummand`. Maybe I'll try to make it more explicit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2249621210 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2249622675 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2249623707 From epeter at openjdk.org Sun Aug 3 06:57:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 3 Aug 2025 06:57:59 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v2] In-Reply-To: <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> Message-ID: <2gZSWEfqUYXYnr_zvnEE3k3N1uh-x5QG3SyW-vFvDok=.72029197-b859-4655-ab90-f639561fdc9b@github.com> On Mon, 28 Jul 2025 11:00:52 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 190 commits: >> >> - manual merge with master >> - fix include order >> - manual merge with master >> - rm multiversioning testing >> - more comments cleanu >> - comment cleanup >> - more descriptions / proof >> - improve comments >> - fix test and code >> - small comment addition >> - ... and 180 more: https://git.openjdk.org/jdk/compare/f40381e4...d7e856d8 > > src/hotspot/share/opto/mempointer.hpp line 706: > >> 704: // Note: we also need to track constants as separate raw summands. For >> 705: // this, we say that a raw summand tracks a constant iff _variable == null, >> 706: // and we store the constant value in _scaleI. > > This contradicts the `con2` example above. True, I further specified things here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2249626715 From epeter at openjdk.org Sun Aug 3 07:02:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 3 Aug 2025 07:02:01 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v2] In-Reply-To: <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> Message-ID: On Mon, 28 Jul 2025 11:02:20 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 190 commits: >> >> - manual merge with master >> - fix include order >> - manual merge with master >> - rm multiversioning testing >> - more comments cleanu >> - comment cleanup >> - more descriptions / proof >> - improve comments >> - fix test and code >> - small comment addition >> - ... and 180 more: https://git.openjdk.org/jdk/compare/f40381e4...d7e856d8 > > src/hotspot/share/opto/mempointer.hpp line 731: > >> 729: } >> 730: >> 731: bool is_valid() const { return _int_group >= 0; } > > Why is _int_group not a `uint` if it is always positive or 0? I don't think it matters too much here. But I do use this as the `is_valid` flag, and I do create invalid summands with the default constructor like this: `MemPointerRawSummand(nullptr, NoOverflowInt::make_NaN(), NoOverflowInt::make_NaN(), -1) {}` Do you see an issue with this, or a significant inefficiency? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2249628473 From epeter at openjdk.org Sun Aug 3 07:09:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 3 Aug 2025 07:09:45 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v3] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <3s88aDoOJL1_dGZVfI9hGFPdYFMFCx8pCmrvvdd5-G8=.dc977145-3b65-4406-a575-6522d8f9edf3@github.com> > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. > - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. > - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` > - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` > - For aliasing analysis (adjacency and overlap), the "regu... Emanuel Peter has updated the pull request incrementally with four additional commits since the last revision: - Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java Co-authored-by: Manuel H?ssig - Update src/hotspot/share/opto/vectorization.hpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/opto/vtransform.hpp Co-authored-by: Manuel H?ssig - some suggestions by Manuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/d7e856d8..6bd997ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=01-02 Stats: 22 lines in 4 files changed: 8 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Sun Aug 3 07:09:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 3 Aug 2025 07:09:45 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v3] In-Reply-To: <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> Message-ID: On Mon, 28 Jul 2025 11:27:05 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with four additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java >> >> Co-authored-by: Manuel H?ssig >> - Update src/hotspot/share/opto/vectorization.hpp >> >> Co-authored-by: Manuel H?ssig >> - Update src/hotspot/share/opto/vtransform.hpp >> >> Co-authored-by: Manuel H?ssig >> - some suggestions by Manuel > > src/hotspot/share/opto/mempointer.cpp line 732: > >> 730: // -> Unknown if overlap at runtime -> return false >> 731: bool MemPointer::always_overlaps_with(const MemPointer& other) const { >> 732: const MemPointerAliasing aliasing = get_aliasing_with(other NOT_PRODUCT( COMMA _trace )); > > Suggestion: > > const MemPointerAliasing aliasing = get_aliasing_with(other NOT_PRODUCT(COMMA _trace)); > > Nit: You used this without spaces already above. To be honest: I've used `NOT_PRODUCT` quite inconsistently here. The most closely matching is the use in `MemPointer::never_overlaps_with`, where I am using spaces. So I'll leave it analogous to that ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2249629811 From epeter at openjdk.org Sun Aug 3 07:09:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 3 Aug 2025 07:09:45 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v3] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> Message-ID: On Sun, 3 Aug 2025 06:49:33 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/mempointer.hpp line 663: >> >>> 661: }; >>> 662: >>> 663: // The MemPointerSummand is designed to allow the simplification of >> >> Shouldn't this be `MemPointerRawSummand`? > > No. I'm explaining the `MemPointerRawSummand` further below. This section should explain the difference between `MemPointerRawSummand` and `MemPointerSummand`. Maybe I'll try to make it more explicit. Is it now better? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2249630613 From epeter at openjdk.org Sun Aug 3 07:17:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 3 Aug 2025 07:17:03 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v3] In-Reply-To: <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> Message-ID: On Mon, 28 Jul 2025 12:17:54 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with four additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java >> >> Co-authored-by: Manuel H?ssig >> - Update src/hotspot/share/opto/vectorization.hpp >> >> Co-authored-by: Manuel H?ssig >> - Update src/hotspot/share/opto/vtransform.hpp >> >> Co-authored-by: Manuel H?ssig >> - some suggestions by Manuel > > src/hotspot/share/opto/superword.cpp line 836: > >> 834: >> 835: // If we cannot speculate (aliasing analysis runtime checks), we need to respect all edges. >> 836: bool with_weak_memory_edges = !_vloop.use_speculative_aliasing_checks(); > > Edges that always have to be respected are strong edges. So, if we cannot speculate, we only have strong edges. With this comment and understanding, I would write the expression as > > bool with_weak_memory_edges = _vloop.use_speculative_aliasing_checks(); > > or > > bool with_strong_memory_edges = !_vloop.use_speculative_aliasing_checks(); Changed it, thanks! > src/hotspot/share/opto/superword.cpp line 878: > >> 876: >> 877: // If we cannot speculate (aliasing analysis runtime checks), we need to respect all edges. >> 878: bool with_weak_memory_edges = !_vloop.use_speculative_aliasing_checks(); > > Same as above. done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2249633620 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2249633644 From epeter at openjdk.org Sun Aug 3 08:13:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 3 Aug 2025 08:13:10 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v4] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <_KNeamSo7fmt9-AZqEQ3LxxU4vZGKqOTjxEkPY9606g=.c4521d17-dca6-45bb-9a6f-ccc54ee75353@github.com> > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. > - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. > - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` > - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` > - For aliasing analysis (adjacency and overlap), the "regu... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8324751-Aliasing-Analysis-RTC' of https://github.com/eme64/jdk into JDK-8324751-Aliasing-Analysis-RTC - more for Manuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/6bd997ec..2e353a51 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=02-03 Stats: 11 lines in 2 files changed: 4 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Sun Aug 3 08:13:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 3 Aug 2025 08:13:11 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v4] In-Reply-To: <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> Message-ID: On Mon, 28 Jul 2025 13:37:22 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8324751-Aliasing-Analysis-RTC' of https://github.com/eme64/jdk into JDK-8324751-Aliasing-Analysis-RTC >> - more for Manuel > > Thank you, @eme64, for this good work! I left some comments below. @mhaessig Thanks for reviewing! I fixed the merge conflict, and addressed all your comments :) > test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java line 176: > >> 174: long t0 = System.nanoTime(); >> 175: // Add a java source file. >> 176: comp.addJavaSourceCode("p.xyz.InnerTest", generate(comp)); > > Nit: perhaps a package related to the test might be nicer in the logs. Like `compiler.loopopts.superword.templated.AliasingFuzzer` Done! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3148187804 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2249725040 From epeter at openjdk.org Sun Aug 3 09:16:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 3 Aug 2025 09:16:11 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v37] In-Reply-To: References: Message-ID: On Wed, 16 Jul 2025 09:38:18 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > test failures test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 835: > 833: IRNode.ADD_VI, "> 0", > 834: IRNode.STORE_VECTOR, "> 0"}, > 835: applyIfAnd = { "ShortRunningLongLoop", "true", "AlignVector", "false" }, These changes weren't perfect, now we are not covering all cases with the IR rules... I'll see if I can fix that with the patch for the aliasing runtime check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2249846347 From epeter at openjdk.org Sun Aug 3 09:27:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 3 Aug 2025 09:27:10 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v5] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. > - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. > - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` > - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` > - For aliasing analysis (adjacency and overlap), the "regu... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix test after merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/2e353a51..8f1f9329 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=03-04 Stats: 42 lines in 1 file changed: 28 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From jkarthikeyan at openjdk.org Mon Aug 4 02:27:14 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Aug 2025 02:27:14 GMT Subject: RFR: 8350468: x86: Improve implementation of vectorized numberOfLeadingZeros for int and long Message-ID: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com> Hi all, This is a patch that optimizes the x86 backend implementation of `CountLeadingZerosV` for int and long. In the review of [JDK-8349637)](https://bugs.openjdk.org/browse/JDK-8349637) an [optimized algorithm]( https://github.com/openjdk/jdk/pull/23579#issuecomment-2661332497) was proposed by @rgiulietti, which this PR implements. For integer operands, the optimized algorithm reduces the number of vector instructions from 19 to 13. The same algorithm does not work for long operands, however, since avx2 lacks a vectorized long->double conversion instruction. Instead, I found an optimized algorithm to reuse the code for int and compute the leading zeros for long with only 4 additional instructions. I added a benchmark and on my Zen 3 machine I get these results: Baseline Patch Benchmark Mode Cnt Score Error Units Score Error Units Improvement LeadingZeros.testInt avgt 15 91.097 ? 3.276 ns/op 68.665 ? 1.740 ns/op (+ 28.1%) LeadingZeros.testLong avgt 15 342.545 ? 4.470 ns/op 228.668 ? 5.994 ns/op (+ 39.9%) I've updated the unit tests to more thoroughly test longs and they pass on my machine. Thoughts and reviews would be appreciated! ------------- Commit messages: - Optimize numberOfLeadingZeros Changes: https://git.openjdk.org/jdk/pull/26610/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26610&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350468 Stats: 225 lines in 3 files changed: 160 ins; 17 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/26610.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26610/head:pull/26610 PR: https://git.openjdk.org/jdk/pull/26610 From xgong at openjdk.org Mon Aug 4 02:31:08 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 4 Aug 2025 02:31:08 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v5] In-Reply-To: References: Message-ID: > This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform. > > ### Background > Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register. > > ### Implementation > > #### Challenges > Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints. > > For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches: > - SPECIES_64: Single operation with mask (8 elements, 256-bit) > - SPECIES_128: Single operation, full register (16 elements, 512-bit) > - SPECIES_256: Two operations + merge (32 elements, 1024-bit) > - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit) > > Use `ByteVector.SPECIES_512` as an example: > - It contains 64 elements. So the index vector size should be `64 * 32` bits, which is 4 times of the SVE vector register size. > - It requires 4 times of vector gather-loads to finish the whole operation. > > > byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...] > int[] idx = [0, 1, 2, 3, ..., 63, ...] > > 4 gather-load: > idx_v1 = [15 14 13 ... 1 0] gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa] > idx_v2 = [31 30 29 ... 17 16] gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb] > idx_v3 = [47 46 45 ... 33 32] gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc] > idx_v4 = [63 62 61 ... 49 48] gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd] > merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa] > > > #### Solution > The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end. > > Here is the main changes: > - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher. > - Added `VectorSliceNode` for result merging. > - Added `VectorMaskWidenNode` for mask spliting and type conversion fo... Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge 'jdk:master' into JDK-8351623-sve - Address review comments - Refine IR pattern and clean backend rules - Fix indentation issue and move the helper matcher method to header files - Merge branch jdk:master into JDK-8351623-sve - 8351623: VectorAPI: Add SVE implementation of subword gather load operation ------------- Changes: https://git.openjdk.org/jdk/pull/26236/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26236&range=04 Stats: 1035 lines in 20 files changed: 875 ins; 24 del; 136 mod Patch: https://git.openjdk.org/jdk/pull/26236.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26236/head:pull/26236 PR: https://git.openjdk.org/jdk/pull/26236 From jkarthikeyan at openjdk.org Mon Aug 4 03:01:58 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Aug 2025 03:01:58 GMT Subject: RFR: 8364580: Test compiler/vectorization/TestSubwordTruncation.java fails on platforms without RoundF/RoundD Message-ID: <7xOctkQXgAftXEeP6TwNSfcc76_oLPnMMv0lS8nCwr8=.7a05fd35-b041-409e-bfb8-9798aa94acce@github.com> Hi all, This is a quick patch to fix the test bug where TestSubwordTruncation fails on platforms that don't implement RoundF and RoundD. Thanks! ------------- Commit messages: - Add platform restriction for round IR checks Changes: https://git.openjdk.org/jdk/pull/26611/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26611&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8364580 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26611.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26611/head:pull/26611 PR: https://git.openjdk.org/jdk/pull/26611 From ghan at openjdk.org Mon Aug 4 04:05:57 2025 From: ghan at openjdk.org (Guanqiang Han) Date: Mon, 4 Aug 2025 04:05:57 GMT Subject: RFR: 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" [v4] In-Reply-To: References: Message-ID: <_eL21WP3BmImP6FjjgrUHsFHbZBv9FxplRshOvBaCnQ=.cea29869-9e6b-46fa-9e55-17f29a54f3b0@github.com> On Thu, 31 Jul 2025 22:17:33 GMT, Dean Long wrote: >> Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - change T_LONG to T_ADDRESS in some intrinsic functions >> - Merge remote-tracking branch 'upstream/master' into 8359235 >> - Increase sleep time to ensure the method gets compiled >> - add regression test >> - Merge remote-tracking branch 'upstream/master' into 8359235 >> - 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" > > Testing results are good. You need one more review. Hi @dean-long , appreciate the review ? I?ll make sure to get one more as you requested ------------- PR Comment: https://git.openjdk.org/jdk/pull/26462#issuecomment-3149056464 From chagedorn at openjdk.org Mon Aug 4 06:20:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Aug 2025 06:20:54 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v2] In-Reply-To: References: Message-ID: On Fri, 1 Aug 2025 11:30:13 GMT, Saranya Natarajan wrote: >> The existing test (`compiler/debug/TestStress.java`) verifies that compiler stress options produce consistent traces when using the same seed. However, there is currently no test to ensure that different seeds result in different traces. >> >> ### Solution >> Added a test case to assess the distinctness of traces generated from different seeds. This fix addresses the fragility concern highlighted in [JDK-8325482](https://bugs.openjdk.org/browse/JDK-8325482) by verifying that traces produced using N (in this case 10) distinct seeds are all not identical. >> >> ### Changes to `compiler/debug/TestStress.java` >> While investigating this issue, I observed that in `compiler/debug/TestStress.java`, the stress options for macro expansion and macro elimination were not being triggered because there were fewer than 2 macro nodes. Note that the `shuffle_macro_nodes()` in` compile.cpp` is only meaningful when there are more than two macro nodes. The generated traces for macro expansion and macro elimination in `TestStress.java` were empty. I have proposed changes to address this problem. > > Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: > > - changing N to 5 > - Adding test for same seed --> same result for N = 10 Marked as reviewed by chagedorn (Reviewer). test/hotspot/jtreg/compiler/debug/TestStressDistinctSeed.java line 102: > 100: ccptrace = ccpTrace(s); > 101: macroexpansiontrace = macroExpansionTrace(s); > 102: macroeliminationtrace = macroEliminationTrace(s); Nit: You should probably use camelCase for readability of the variables. ------------- PR Review: https://git.openjdk.org/jdk/pull/26554#pullrequestreview-3082835852 PR Review Comment: https://git.openjdk.org/jdk/pull/26554#discussion_r2250507203 From chagedorn at openjdk.org Mon Aug 4 06:20:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Aug 2025 06:20:56 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v2] In-Reply-To: References: Message-ID: On Fri, 1 Aug 2025 11:33:52 GMT, Saranya Natarajan wrote: >> test/hotspot/jtreg/compiler/debug/TestStressDistinctSeed.java line 102: >> >>> 100: ccpTraceSet.add(ccpTrace(s)); >>> 101: macroExpansionTraceSet.add(macroExpansionTrace(s)); >>> 102: macroEliminationTraceSet.add(macroEliminationTrace(s)); >> >> A suggestion, do you also want to check here that two runs with the same seed produce the same result to show that different seeds really produce different results due to the seed and not just some indeterminism with the test itself? How long does your test need now and afterwards with a fastdebug build? Maybe we can also lower the number of seeds if it takes too long or only do the equality-test for a single seed. > > Thank you for the review. > > This is a very good point. I implemented and tested what you suggested. Below are some numbers that I obtained from running the test `compiler/debug/TestStressDistinctSeed.java` with `jtreg -vt ` > > - **commit 513ab6d322540aaaf5a167cebb30b87736f7cd91 [with no check for same seed -> same trace ]** > **slowdebug** > build: 7.205 seconds > driver: 32.111 seconds > **fastdebug** > build: 0.002 seconds > driver: 9.102 seconds > > - **commit 7eff4d55024db36b811e4304cf706354e25c8200 [with check for same seed -> same trace and N = 10 ]** > **slowdebug**** > build: 7.55 seconds > driver: 63.108 seconds > **fastdebug** > build: 0.0 seconds > driver: 16.259 seconds > > - **commit 14617e01a032fe05775eda36f4f3172137ccd2e8 [with check for same seed -> same trace and N = 5 ] > slowdebug** > build: 0.001 seconds > driver: 31.946 seconds > **fastdebug** > build: 0.0 seconds > driver: 8.596 seconds > > I think N=5 for the updated test looks reasonable. Do you think this is okay ? Thanks for the update and the numbers! I agree that `N=5` seems reasonable. Looks good! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26554#discussion_r2250506280 From chagedorn at openjdk.org Mon Aug 4 06:21:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Aug 2025 06:21:59 GMT Subject: RFR: 8364580: Test compiler/vectorization/TestSubwordTruncation.java fails on platforms without RoundF/RoundD In-Reply-To: <7xOctkQXgAftXEeP6TwNSfcc76_oLPnMMv0lS8nCwr8=.7a05fd35-b041-409e-bfb8-9798aa94acce@github.com> References: <7xOctkQXgAftXEeP6TwNSfcc76_oLPnMMv0lS8nCwr8=.7a05fd35-b041-409e-bfb8-9798aa94acce@github.com> Message-ID: On Mon, 4 Aug 2025 02:54:53 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a quick patch to fix the test bug where TestSubwordTruncation fails on platforms that don't implement RoundF and RoundD. Thanks! Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26611#pullrequestreview-3082840039 From mhaessig at openjdk.org Mon Aug 4 06:57:37 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 4 Aug 2025 06:57:37 GMT Subject: [jdk25] RFR: 8364409: [BACKOUT] Consolidate Identity of self-inverse operations Message-ID: Hi all, This pull request contains a backport of commit [ddb64836](https://github.com/openjdk/jdk/commit/ddb64836e5bafededb705329137e353f8c74dd5d) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Manuel H?ssig on 31 Jul 2025 and was reviewed by Tobias Hartmann, Beno?t Maillard and Hannes Greule. Thanks! ------------- Commit messages: - Backport ddb64836e5bafededb705329137e353f8c74dd5d Changes: https://git.openjdk.org/jdk/pull/26613/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26613&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8364409 Stats: 245 lines in 4 files changed: 8 ins; 225 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/26613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26613/head:pull/26613 PR: https://git.openjdk.org/jdk/pull/26613 From bmaillard at openjdk.org Mon Aug 4 07:06:55 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 4 Aug 2025 07:06:55 GMT Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph [v2] In-Reply-To: References: <1A8oR7hEgev2U_ys1H_AVJS5kjw6LWoPgrVPhJXSFqI=.34cbd04b-bf88-441f-9c3d-97f9aee7f3c3@github.com> <7KtjKex3ik1mzDGUP0J7vI0bdzWz-OaqprBbXlbhbE0=.7299f51f-4316-4371-bca2-846ae5bc6671@github.com> Message-ID: On Thu, 31 Jul 2025 13:28:14 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/graphInvariants.cpp line 197: >> >>> 195: }; >>> 196: >>> 197: struct HasType : Pattern { >> >> Could we make it slightly more general and accept any predicate on the type? From a previous PR that I worked on I remember that for example for `ModINode` if it has no control input then its divisor input should never be `0`. Maybe this is the kind of properties we could check in the future. This is just a random idea, feel free to ignore. > > I think there is a misunderstanding here. I'm talking about node type, as in which C++ class is it, not type as abstract values for nodes. I could rename this struct then. Maybe HasNodeType? Or maybe `NodeClass`: one could see `NodeClass(&Node::is_Region)` that reads almost as "node class is Region". Open to ideas... > > Also, in theory, it accepts any method of `Node` of type `bool()`. This could be used for something else. The idea was to make easy to say "I want a Node of type `IfNode` here". It's not that great to do with Opcode because of derived classes. I also considered something that would take any `Node -> bool` function, but that made the simple case harder. Instead of `HasType(&Node::is_If)`, I would have had to write something like `HasType([](const Node& n) { return n.is_If(); })`. Functional programming is possible in C++, but not quite syntactically elegant, and I think readability here is important. If such a need arises, I suggest to add a `UnaryPredicate` (or `NodePredicate` etc.) to do that. If the predicates are complicated enough, the bit of symbols needed for making a lambda doesn't matter so much. > > As for your case, yes, we can add that in the future. It could be done with the UnaryPredicate I describe above, or with a more specific pattern that would work on types, and take a method `bool(Type::))()` or a function `bool(const Type&)` and the pattern would take care of finding the type and submitting it to the predicate. Not that it's a lot of work, but it allows to communicate more clearly the intention, in my opinion. Yes of course, I got terribly confused, sorry for that. I agree that readability is important, and I would also keep it the way it is now. `HasNodeType` sounds pretty good in my opinion. Thanks for clarifying! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2250594687 From chagedorn at openjdk.org Mon Aug 4 07:10:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Aug 2025 07:10:55 GMT Subject: [jdk25] RFR: 8364409: [BACKOUT] Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 06:51:36 GMT, Manuel H?ssig wrote: > Hi all, > > This pull request contains a backport of commit [ddb64836](https://github.com/openjdk/jdk/commit/ddb64836e5bafededb705329137e353f8c74dd5d) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Manuel H?ssig on 31 Jul 2025 and was reviewed by Tobias Hartmann, Beno?t Maillard and Hannes Greule. > > Thanks! Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26613#pullrequestreview-3082964085 From bmaillard at openjdk.org Mon Aug 4 07:16:56 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 4 Aug 2025 07:16:56 GMT Subject: [jdk25] RFR: 8364409: [BACKOUT] Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 06:51:36 GMT, Manuel H?ssig wrote: > Hi all, > > This pull request contains a backport of commit [ddb64836](https://github.com/openjdk/jdk/commit/ddb64836e5bafededb705329137e353f8c74dd5d) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Manuel H?ssig on 31 Jul 2025 and was reviewed by Tobias Hartmann, Beno?t Maillard and Hannes Greule. > > Thanks! All good for me! ------------- Marked as reviewed by bmaillard (Author). PR Review: https://git.openjdk.org/jdk/pull/26613#pullrequestreview-3082978478 From dfenacci at openjdk.org Mon Aug 4 08:32:55 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 4 Aug 2025 08:32:55 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v2] In-Reply-To: References: Message-ID: On Fri, 1 Aug 2025 11:30:13 GMT, Saranya Natarajan wrote: >> The existing test (`compiler/debug/TestStress.java`) verifies that compiler stress options produce consistent traces when using the same seed. However, there is currently no test to ensure that different seeds result in different traces. >> >> ### Solution >> Added a test case to assess the distinctness of traces generated from different seeds. This fix addresses the fragility concern highlighted in [JDK-8325482](https://bugs.openjdk.org/browse/JDK-8325482) by verifying that traces produced using N (in this case 10) distinct seeds are all not identical. >> >> ### Changes to `compiler/debug/TestStress.java` >> While investigating this issue, I observed that in `compiler/debug/TestStress.java`, the stress options for macro expansion and macro elimination were not being triggered because there were fewer than 2 macro nodes. Note that the `shuffle_macro_nodes()` in` compile.cpp` is only meaningful when there are more than two macro nodes. The generated traces for macro expansion and macro elimination in `TestStress.java` were empty. I have proposed changes to address this problem. > > Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: > > - changing N to 5 > - Adding test for same seed --> same result for N = 10 Thanks for looking into this @sarannat! I just left a couple of inline comments. test/hotspot/jtreg/compiler/debug/TestStressDistinctSeed.java line 34: > 32: /* > 33: * @test > 34: * @key stress randomness Is the test actually "randomised"? test/hotspot/jtreg/compiler/debug/TestStressDistinctSeed.java line 99: > 97: if (args.length == 0) { > 98: for (int s = 0; s < 5; s++) { > 99: igvntrace = igvnTrace(s); Did you choose the 0-4 seeds to be sure that there are at least a couple of different traces? I guess it wouldn't be so easy to exclude that with random values, right? ------------- PR Review: https://git.openjdk.org/jdk/pull/26554#pullrequestreview-3083165080 PR Review Comment: https://git.openjdk.org/jdk/pull/26554#discussion_r2250785876 PR Review Comment: https://git.openjdk.org/jdk/pull/26554#discussion_r2250752204 From mhaessig at openjdk.org Mon Aug 4 08:42:02 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 4 Aug 2025 08:42:02 GMT Subject: [jdk25] RFR: 8364409: [BACKOUT] Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 07:08:45 GMT, Christian Hagedorn wrote: >> Hi all, >> >> This pull request contains a backport of commit [ddb64836](https://github.com/openjdk/jdk/commit/ddb64836e5bafededb705329137e353f8c74dd5d) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Manuel H?ssig on 31 Jul 2025 and was reviewed by Tobias Hartmann, Beno?t Maillard and Hannes Greule. >> >> Thanks! > > Looks good! Thank you for your reviews @chhagedorn and @benoitmaillard! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26613#issuecomment-3149655294 From mhaessig at openjdk.org Mon Aug 4 08:42:03 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 4 Aug 2025 08:42:03 GMT Subject: [jdk25] Integrated: 8364409: [BACKOUT] Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 06:51:36 GMT, Manuel H?ssig wrote: > Hi all, > > This pull request contains a backport of commit [ddb64836](https://github.com/openjdk/jdk/commit/ddb64836e5bafededb705329137e353f8c74dd5d) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Manuel H?ssig on 31 Jul 2025 and was reviewed by Tobias Hartmann, Beno?t Maillard and Hannes Greule. > > Thanks! This pull request has now been integrated. Changeset: 24936b92 Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/24936b9295e2f0127ee7c683d5fdafc183168a7c Stats: 245 lines in 4 files changed: 8 ins; 225 del; 12 mod 8364409: [BACKOUT] Consolidate Identity of self-inverse operations Reviewed-by: chagedorn, bmaillard Backport-of: ddb64836e5bafededb705329137e353f8c74dd5d ------------- PR: https://git.openjdk.org/jdk/pull/26613 From duke at openjdk.org Mon Aug 4 08:46:54 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 4 Aug 2025 08:46:54 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v11] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: move vredsum_vs out of VEC_LOOP to improve performance ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/0c2fbee9..c558db0b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=09-10 Stats: 8 lines in 1 file changed: 3 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Mon Aug 4 08:46:56 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 4 Aug 2025 08:46:56 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v10] In-Reply-To: References: Message-ID: On Thu, 31 Jul 2025 01:56:31 GMT, Fei Yang wrote: > What's the performance look like with a smaller `lmul` (m1 or m2)? I am asking this because there are hardwares there (like SG2044) with a VLEN of 128 instead of 256 like on K1. Sure, I'll do it, thanks for the suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2250819708 From duke at openjdk.org Mon Aug 4 08:50:06 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 4 Aug 2025 08:50:06 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v11] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 08:46:54 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > move vredsum_vs out of VEC_LOOP to improve performance The c558db0 shows better numbers due to movement of reduction-sum instruction out of vectorized loop: bpif3-16g% ( for i in "-XX:DisableIntrinsic=_vectorizedHashCode" "-XX:-UseRVV" "-XX:+UseRVV" ; \ do ( echo "--- ${i} ---" && ${JAVA_HOME}/bin/java -jar benchmarks.jar \ --jvmArgs="-XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions ${i}" \ org.openjdk.bench.java.lang.ArraysHashCode.ints \ -p size=1,5,10,20,30,40,50,60,70,80,90,100,200,300 \ -f 1 -r 1 -w 1 -wi 10 -i 10 2>&1 | tail -15 ) done ) --- -XX:DisableIntrinsic=_vectorizedHashCode --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 10 11.275 ? 0.003 ns/op ArraysHashCode.ints 5 avgt 10 28.820 ? 0.020 ns/op ArraysHashCode.ints 10 avgt 10 41.107 ? 0.413 ns/op ArraysHashCode.ints 20 avgt 10 67.941 ? 0.267 ns/op ArraysHashCode.ints 30 avgt 10 88.906 ? 0.352 ns/op ArraysHashCode.ints 40 avgt 10 114.968 ? 0.301 ns/op ArraysHashCode.ints 50 avgt 10 135.744 ? 0.575 ns/op ArraysHashCode.ints 60 avgt 10 162.996 ? 0.219 ns/op ArraysHashCode.ints 70 avgt 10 170.975 ? 0.368 ns/op ArraysHashCode.ints 80 avgt 10 192.728 ? 0.236 ns/op ArraysHashCode.ints 90 avgt 10 207.485 ? 0.205 ns/op ArraysHashCode.ints 100 avgt 10 232.791 ? 0.177 ns/op ArraysHashCode.ints 200 avgt 10 446.733 ? 0.396 ns/op ArraysHashCode.ints 300 avgt 10 653.086 ? 0.389 ns/op --- -XX:-UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 10 11.281 ? 0.003 ns/op ArraysHashCode.ints 5 avgt 10 24.469 ? 0.008 ns/op ArraysHashCode.ints 10 avgt 10 35.697 ? 0.014 ns/op ArraysHashCode.ints 20 avgt 10 58.906 ? 0.060 ns/op ArraysHashCode.ints 30 avgt 10 82.734 ? 0.023 ns/op ArraysHashCode.ints 40 avgt 10 105.856 ? 0.017 ns/op ArraysHashCode.ints 50 avgt 10 129.656 ? 0.033 ns/op ArraysHashCode.ints 60 avgt 10 152.825 ? 0.057 ns/op ArraysHashCode.ints 70 avgt 10 176.630 ? 0.077 ns/op ArraysHashCode.ints 80 avgt 10 199.810 ? 0.118 ns/op ArraysHashCode.ints 90 avgt 10 223.571 ? 0.026 ns/op ArraysHashCode.ints 100 avgt 10 247.887 ? 0.387 ns/op ArraysHashCode.ints 200 avgt 10 481.636 ? 0.163 ns/op ArraysHashCode.ints 300 avgt 10 716.446 ? 0.402 ns/op --- -XX:+UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 10 11.331 ? 0.020 ns/op ArraysHashCode.ints 5 avgt 10 21.309 ? 0.004 ns/op ArraysHashCode.ints 10 avgt 10 33.858 ? 0.017 ns/op ArraysHashCode.ints 20 avgt 10 58.878 ? 0.025 ns/op ArraysHashCode.ints 30 avgt 10 83.918 ? 0.016 ns/op ArraysHashCode.ints 40 avgt 10 110.763 ? 0.184 ns/op ArraysHashCode.ints 50 avgt 10 135.274 ? 0.027 ns/op ArraysHashCode.ints 60 avgt 10 157.186 ? 0.034 ns/op ArraysHashCode.ints 70 avgt 10 121.519 ? 0.073 ns/op ArraysHashCode.ints 80 avgt 10 142.846 ? 0.114 ns/op ArraysHashCode.ints 90 avgt 10 167.906 ? 0.173 ns/op ArraysHashCode.ints 100 avgt 10 130.926 ? 0.204 ns/op ArraysHashCode.ints 200 avgt 10 189.173 ? 0.100 ns/op ArraysHashCode.ints 300 avgt 10 237.364 ? 0.107 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3149681739 From duke at openjdk.org Mon Aug 4 09:24:33 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Mon, 4 Aug 2025 09:24:33 GMT Subject: RFR: 8364618: Sort share/code includes Message-ID: This PR sorts the includes in `hotspot/share/code` using `SortIncludes.java`. I'm also adding the directory to `TestIncludesAreSorted`. Passes tier1. ------------- Commit messages: - sort Changes: https://git.openjdk.org/jdk/pull/26616/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26616&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8364618 Stats: 24 lines in 7 files changed: 12 ins; 12 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26616.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26616/head:pull/26616 PR: https://git.openjdk.org/jdk/pull/26616 From duke at openjdk.org Mon Aug 4 09:54:29 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Mon, 4 Aug 2025 09:54:29 GMT Subject: RFR: 8358598: PhaseIterGVN::PhaseIterGVN(PhaseGVN* gvn) doesn't use its parameter Message-ID: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> As noted in the ticket, I propose a small cleanup of `PhaseIterGVN` since one of the constructors does not use its parameter. ------------- Commit messages: - nn - cleanup Changes: https://git.openjdk.org/jdk/pull/26617/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26617&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358598 Stats: 17 lines in 4 files changed: 0 ins; 5 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/26617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26617/head:pull/26617 PR: https://git.openjdk.org/jdk/pull/26617 From shade at openjdk.org Mon Aug 4 10:06:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Aug 2025 10:06:59 GMT Subject: RFR: 8364618: Sort share/code includes In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 09:15:54 GMT, Francesco Andreuzzi wrote: > This PR sorts the includes in `hotspot/share/code` using `SortIncludes.java`. I'm also adding the directory to `TestIncludesAreSorted`. > > Passes tier1. src/hotspot/share/code/aotCodeCache.cpp line 60: > 58: #include "gc/z/zBarrierSetRuntime.hpp" > 59: #endif > 60: #ifdef COMPILER2 This one looks weird. This splits `#ifdef COMPILER1` and `#ifdef COMPILER2` blocks. Was that the automatic move, or have you moved it yourself? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26616#discussion_r2251012833 From duke at openjdk.org Mon Aug 4 10:09:56 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Mon, 4 Aug 2025 10:09:56 GMT Subject: RFR: 8364618: Sort share/code includes In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 10:04:13 GMT, Aleksey Shipilev wrote: >> This PR sorts the includes in `hotspot/share/code` using `SortIncludes.java`. I'm also adding the directory to `TestIncludesAreSorted`. >> >> Passes tier1. > > src/hotspot/share/code/aotCodeCache.cpp line 60: > >> 58: #include "gc/z/zBarrierSetRuntime.hpp" >> 59: #endif >> 60: #ifdef COMPILER2 > > This one looks weird. This splits `#ifdef COMPILER1` and `#ifdef COMPILER2` blocks. > Was that the automatic move, or have you moved it yourself? I did it myself, so the conditionally included modules are sorted alphabetically. Should I revert it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26616#discussion_r2251020668 From shade at openjdk.org Mon Aug 4 10:19:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Aug 2025 10:19:54 GMT Subject: RFR: 8364618: Sort share/code includes In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 10:07:28 GMT, Francesco Andreuzzi wrote: >> src/hotspot/share/code/aotCodeCache.cpp line 60: >> >>> 58: #include "gc/z/zBarrierSetRuntime.hpp" >>> 59: #endif >>> 60: #ifdef COMPILER2 >> >> This one looks weird. This splits `#ifdef COMPILER1` and `#ifdef COMPILER2` blocks. >> Was that the automatic move, or have you moved it yourself? > > I did it myself, so the conditionally included modules are sorted alphabetically. Should I revert it? Yes, revert this hunk. The GC blocks are somewhat odd (notice the difference between `#if` and `#ifdef`), and I think the common style is to include them the last. Not that it is codified anywhere, AFAICS. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26616#discussion_r2251044260 From mhaessig at openjdk.org Mon Aug 4 10:27:00 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 4 Aug 2025 10:27:00 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v5] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> Message-ID: On Sun, 3 Aug 2025 06:59:07 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/mempointer.hpp line 731: >> >>> 729: } >>> 730: >>> 731: bool is_valid() const { return _int_group >= 0; } >> >> Why is _int_group not a `uint` if it is always positive or 0? > > I don't think it matters too much here. But I do use this as the `is_valid` flag, and I do create invalid summands with the default constructor like this: > `MemPointerRawSummand(nullptr, NoOverflowInt::make_NaN(), NoOverflowInt::make_NaN(), -1) {}` > > Do you see an issue with this, or a significant inefficiency? No, I just did not catch the -1 in the default constructor and was wondering if `is_valid()` could be dropped by just using an unsigned type. All good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2251061866 From duke at openjdk.org Mon Aug 4 10:35:25 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Mon, 4 Aug 2025 10:35:25 GMT Subject: RFR: 8364618: Sort share/code includes [v2] In-Reply-To: References: Message-ID: > This PR sorts the includes in `hotspot/share/code` using `SortIncludes.java`. I'm also adding the directory to `TestIncludesAreSorted`. > > Passes tier1. Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: revert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26616/files - new: https://git.openjdk.org/jdk/pull/26616/files/e5431e02..c4417159 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26616&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26616&range=00-01 Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26616.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26616/head:pull/26616 PR: https://git.openjdk.org/jdk/pull/26616 From duke at openjdk.org Mon Aug 4 10:35:25 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Mon, 4 Aug 2025 10:35:25 GMT Subject: RFR: 8364618: Sort share/code includes [v2] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 10:17:40 GMT, Aleksey Shipilev wrote: >> I did it myself, so the conditionally included modules are sorted alphabetically. Should I revert it? > > Yes, revert this hunk. The GC blocks are somewhat odd (notice the difference between `#if` and `#ifdef`), and I think the common style is to include them the last. Not that it is codified anywhere, AFAICS. c441715908e0c3f84016faa232ad8afbab89a972 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26616#discussion_r2251072742 From shade at openjdk.org Mon Aug 4 10:35:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Aug 2025 10:35:25 GMT Subject: RFR: 8364618: Sort share/code includes [v2] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 10:31:54 GMT, Francesco Andreuzzi wrote: >> This PR sorts the includes in `hotspot/share/code` using `SortIncludes.java`. I'm also adding the directory to `TestIncludesAreSorted`. >> >> Passes tier1. > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > revert Looks good. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26616#pullrequestreview-3083595388 From mhaessig at openjdk.org Mon Aug 4 10:37:54 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 4 Aug 2025 10:37:54 GMT Subject: RFR: 8364618: Sort share/code includes [v2] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 10:35:25 GMT, Francesco Andreuzzi wrote: >> This PR sorts the includes in `hotspot/share/code` using `SortIncludes.java`. I'm also adding the directory to `TestIncludesAreSorted`. >> >> Passes tier1. > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > revert Thank you for working on this! The changes look good to me. Let's wait for Github Actions to pass and the 24h before integrating. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26616#pullrequestreview-3083610327 From mhaessig at openjdk.org Mon Aug 4 10:58:06 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 4 Aug 2025 10:58:06 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v5] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <_jNlOz7RH0YN28AR-LhEwqnaPa_Vy-nUd3B_bMTYum8=.9307cd79-0f69-440d-bf0f-3a0fc54a8335@github.com> On Sun, 3 Aug 2025 09:27:10 GMT, Emanuel Peter wrote: >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. >> - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. >> - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` >> - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + Conv... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix test after merge Thank you for addressing my comments. I only have a few follow-ups. test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java line 246: > 244: return TestFrameworkClass.render( > 245: // package and class name. > 246: "p.xyz", "InnerTest", Suggestion: "compiler.loopopts.superword.templated", "AliasingFuzzer", This went missing test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 823: > 821: applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true", "rvv", "true"}) > 822: // FAILS: invariants are sorted differently, because of differently inserted Cast. > 823: // See: JDK-8331659 With the integration of #26429, this should pass. test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 866: > 864: applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true", "rvv", "true"}) > 865: // FAILS: invariants are sorted differently, because of differently inserted Cast. > 866: // See: JDK-8331659 With the integration of #26429, this should pass. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/24278#pullrequestreview-3083634456 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2251128584 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2251112018 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2251112943 From mhaessig at openjdk.org Mon Aug 4 10:58:07 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 4 Aug 2025 10:58:07 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v5] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <1jRz0k69pSoITg9V5DiMv7pYixyilnf68vOkwEm-34w=.b982d419-795e-445f-92f7-a3abfc76fa37@github.com> Message-ID: On Sun, 3 Aug 2025 07:04:37 GMT, Emanuel Peter wrote: >> No. I'm explaining the `MemPointerRawSummand` further below. This section should explain the difference between `MemPointerRawSummand` and `MemPointerSummand`. Maybe I'll try to make it more explicit. > > Is it now better? Yes, perfect. Thank you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2251106255 From shade at openjdk.org Mon Aug 4 11:02:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Aug 2025 11:02:53 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants In-Reply-To: References: Message-ID: <2u5BCO0l185IVtJXM804yxWZo_MMMb-3hrlOqhiAbQs=.f5daa3b7-6d4a-47be-83b1-f5ef749a6c9b@github.com> On Fri, 1 Aug 2025 07:03:07 GMT, Aleksey Shipilev wrote: > Thanks! @TobiHartmann, are you good with this? Friendly ping :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26559#issuecomment-3150123963 From shade at openjdk.org Mon Aug 4 11:48:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Aug 2025 11:48:53 GMT Subject: RFR: 8364580: Test compiler/vectorization/TestSubwordTruncation.java fails on platforms without RoundF/RoundD In-Reply-To: <7xOctkQXgAftXEeP6TwNSfcc76_oLPnMMv0lS8nCwr8=.7a05fd35-b041-409e-bfb8-9798aa94acce@github.com> References: <7xOctkQXgAftXEeP6TwNSfcc76_oLPnMMv0lS8nCwr8=.7a05fd35-b041-409e-bfb8-9798aa94acce@github.com> Message-ID: On Mon, 4 Aug 2025 02:54:53 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a quick patch to fix the test bug where TestSubwordTruncation fails on platforms that don't implement RoundF and RoundD. Thanks! Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26611#pullrequestreview-3083820403 From jkarthikeyan at openjdk.org Mon Aug 4 12:13:57 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Aug 2025 12:13:57 GMT Subject: RFR: 8364580: Test compiler/vectorization/TestSubwordTruncation.java fails on platforms without RoundF/RoundD In-Reply-To: <7xOctkQXgAftXEeP6TwNSfcc76_oLPnMMv0lS8nCwr8=.7a05fd35-b041-409e-bfb8-9798aa94acce@github.com> References: <7xOctkQXgAftXEeP6TwNSfcc76_oLPnMMv0lS8nCwr8=.7a05fd35-b041-409e-bfb8-9798aa94acce@github.com> Message-ID: On Mon, 4 Aug 2025 02:54:53 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a quick patch to fix the test bug where TestSubwordTruncation fails on platforms that don't implement RoundF and RoundD. Thanks! Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26611#issuecomment-3150353528 From jkarthikeyan at openjdk.org Mon Aug 4 12:13:58 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Aug 2025 12:13:58 GMT Subject: Integrated: 8364580: Test compiler/vectorization/TestSubwordTruncation.java fails on platforms without RoundF/RoundD In-Reply-To: <7xOctkQXgAftXEeP6TwNSfcc76_oLPnMMv0lS8nCwr8=.7a05fd35-b041-409e-bfb8-9798aa94acce@github.com> References: <7xOctkQXgAftXEeP6TwNSfcc76_oLPnMMv0lS8nCwr8=.7a05fd35-b041-409e-bfb8-9798aa94acce@github.com> Message-ID: On Mon, 4 Aug 2025 02:54:53 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a quick patch to fix the test bug where TestSubwordTruncation fails on platforms that don't implement RoundF and RoundD. Thanks! This pull request has now been integrated. Changeset: 500462fb Author: Jasmine Karthikeyan URL: https://git.openjdk.org/jdk/commit/500462fb690c25da3816467e27fc66d25b4eb7dc Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8364580: Test compiler/vectorization/TestSubwordTruncation.java fails on platforms without RoundF/RoundD Reviewed-by: chagedorn, shade ------------- PR: https://git.openjdk.org/jdk/pull/26611 From qamai at openjdk.org Mon Aug 4 12:41:56 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 4 Aug 2025 12:41:56 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 16:20:43 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. > > Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. > > Additional testing: > - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` I'm not sure about the fix. Besides the creation of illegal `ConN` which can be caught during compilation, we also face the circumstances when we try to `EncodeP` an uncompressible pointer, which seems very wrong. Maybe the type system needs to keep track of the compressibility of each pointer and disallows ones that may not be compressible to be the input of an `EncodeP`. src/hotspot/share/opto/compile.cpp line 3663: > 3661: n->subsume_by(ConNode::make(t->make_narrowoop()), this); > 3662: } else if (t->isa_klassptr()) { > 3663: ciKlass* klass = t->is_klassptr()->exact_klass(); This branch means that we are trying to compress a pointer that cannot be compressed. This seems wrong either way. ------------- PR Review: https://git.openjdk.org/jdk/pull/26559#pullrequestreview-3083998493 PR Review Comment: https://git.openjdk.org/jdk/pull/26559#discussion_r2251354535 From snatarajan at openjdk.org Mon Aug 4 12:44:21 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 4 Aug 2025 12:44:21 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v3] In-Reply-To: References: Message-ID: > The existing test (`compiler/debug/TestStress.java`) verifies that compiler stress options produce consistent traces when using the same seed. However, there is currently no test to ensure that different seeds result in different traces. > > ### Solution > Added a test case to assess the distinctness of traces generated from different seeds. This fix addresses the fragility concern highlighted in [JDK-8325482](https://bugs.openjdk.org/browse/JDK-8325482) by verifying that traces produced using N (in this case 10) distinct seeds are all not identical. > > ### Changes to `compiler/debug/TestStress.java` > While investigating this issue, I observed that in `compiler/debug/TestStress.java`, the stress options for macro expansion and macro elimination were not being triggered because there were fewer than 2 macro nodes. Note that the `shuffle_macro_nodes()` in` compile.cpp` is only meaningful when there are more than two macro nodes. The generated traces for macro expansion and macro elimination in `TestStress.java` were empty. I have proposed changes to address this problem. Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: addressing review comments on camelCase ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26554/files - new: https://git.openjdk.org/jdk/pull/26554/files/14617e01..bca4a0ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26554&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26554&range=01-02 Stats: 13 lines in 1 file changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/26554.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26554/head:pull/26554 PR: https://git.openjdk.org/jdk/pull/26554 From snatarajan at openjdk.org Mon Aug 4 12:44:22 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 4 Aug 2025 12:44:22 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v2] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 08:14:28 GMT, Damon Fenacci wrote: >> Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: >> >> - changing N to 5 >> - Adding test for same seed --> same result for N = 10 > > test/hotspot/jtreg/compiler/debug/TestStressDistinctSeed.java line 99: > >> 97: if (args.length == 0) { >> 98: for (int s = 0; s < 5; s++) { >> 99: igvntrace = igvnTrace(s); > > Did you choose the 0-4 seeds to be sure that there are at least a couple of different traces? I guess it wouldn't be so easy to exclude that with random values, right? Thank you for the review. This comes from [test/hotspot/jtreg/compiler/debug/TestStressDistinctSeed.java](https://github.com/openjdk/jdk/pull/26554/files/14617e01a032fe05775eda36f4f3172137ccd2e8#diff-abc400ea5cb08b3f662a32173c0ee2d15306f51d1e8930bd295ab7c2d2b52980) and my reasoning is same as what you mentioned above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26554#discussion_r2251370113 From snatarajan at openjdk.org Mon Aug 4 12:46:55 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 4 Aug 2025 12:46:55 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v2] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 08:29:04 GMT, Damon Fenacci wrote: >> Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: >> >> - changing N to 5 >> - Adding test for same seed --> same result for N = 10 > > test/hotspot/jtreg/compiler/debug/TestStressDistinctSeed.java line 34: > >> 32: /* >> 33: * @test >> 34: * @key stress randomness > > Is the test actually "randomised"? My argument for this comes from [JDK-8270156](https://bugs.openjdk.org/browse/JDK-8270156) where `stress` and `random` keywords were added to all JTreg tests which use StressGCM, StressLCM and/or StressIGVN. This was extended to StressCCP, StressMacroExpansion, and StressMacroElimination. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26554#discussion_r2251377302 From fbredberg at openjdk.org Mon Aug 4 12:52:39 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Mon, 4 Aug 2025 12:52:39 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v2] In-Reply-To: References: Message-ID: > Since the integration of [JDK-8359437](https://bugs.openjdk.org/browse/JDK-8359437) the `LockingMode` flag can no longer be set by the user, instead it's declared as `const int LockingMode = LM_LIGHTWEIGHT;`. This means that we can now safely remove all `LockingMode` related code from all platforms. > > This PR removes `LockingMode` related code from the **x86** platform. > > When all the `LockingMode` code has been removed from all platforms, we can go on and remove it from shared (non-platform specific) files as well. And finally remove the `LockingMode` variable itself. > > Passes tier1-tier5 with no added problems. Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: Update one after review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26552/files - new: https://git.openjdk.org/jdk/pull/26552/files/9290d4ad..2a052186 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26552&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26552&range=00-01 Stats: 18 lines in 4 files changed: 0 ins; 16 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26552.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26552/head:pull/26552 PR: https://git.openjdk.org/jdk/pull/26552 From fbredberg at openjdk.org Mon Aug 4 12:52:40 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Mon, 4 Aug 2025 12:52:40 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v2] In-Reply-To: References: Message-ID: On Fri, 1 Aug 2025 06:12:28 GMT, Axel Boldt-Christmas wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update one after review > > Nice cleanup! Some small initial comments. > > All the "displaced header" comments looks out of place. Displacing the header word on the stack (in the box) was purely a LM_LEGACY thing. Now we only displace it in the ObjectMonitor which is only handled (inflation / deflation) in the C++ runtime. > > There are some more `BasicLock::displaced_header_offset_in_bytes()` asserts inside the x86 code. For callers of these methods, could be removed now or when the `BasicLock` is cleaned up. > > There are some unused variables because of "displaced header" code that is kept. > > `fast_lock_lightweight` and `fast_unlock_lockweight` should probably be renamed `fast_lock` and `fast_unlock` to be in sync with all the comments. (Or all the comments should be updated) (Same with C2 AD instruction) @xmas92 > All the "displaced header" comments looks out of place. Displacing the header word on the stack (in the box) was purely a LM_LEGACY thing. Now we only displace it in the ObjectMonitor which is only handled (inflation / deflation) in the C++ runtime. > > There are some more `BasicLock::displaced_header_offset_in_bytes()` asserts inside the x86 code. For callers of these methods, could be removed now or when the `BasicLock` is cleaned up. I basically agree. It's a bit of a mess right now and arguments that are called `disp_hdr` should probably be changed to `basic_lock`. But I'd rather do that in a separate PR and have this (and all the similar other platform PRs) only handle removing of dead code due to removal of the `LockingMode` flag. It's far easier to review a PR that has just removed code, than to a one that has also refactored lots of code. > There are some unused variables because of "displaced header" code that is kept. Fixed those you saw, and some more that I found. > `fast_lock_lightweight` and `fast_unlock_lockweight` should probably be renamed `fast_lock` and `fast_unlock` to be in sync with all the comments. (Or all the comments should be updated) (Same with C2 AD instruction) Totally agree. But that also for a separate PR after the shared (non-platform specific) files has been fixed. > src/hotspot/cpu/x86/interp_masm_x86.cpp line 1032: > >> 1030: const Register tmp_reg = rbx; >> 1031: const Register obj_reg = c_rarg3; // Will contain the oop >> 1032: const Register rklass_decode_tmp = rscratch1; > > Unused variable. Fixed > src/hotspot/cpu/x86/interp_masm_x86.cpp line 1037: > >> 1035: const int lock_offset = in_bytes(BasicObjectLock::lock_offset()); >> 1036: const int mark_offset = lock_offset + >> 1037: BasicLock::displaced_header_offset_in_bytes(); > > Unused variable. Fixed > src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 2194: > >> 2192: >> 2193: // Load the oop from the handle >> 2194: __ movptr(obj_reg, Address(oop_handle_reg, 0)); > > `mark_word_offset` and `count_mon` unused variable above. Fixed ------------- PR Comment: https://git.openjdk.org/jdk/pull/26552#issuecomment-3150525525 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2251387407 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2251387954 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2251388647 From shade at openjdk.org Mon Aug 4 13:09:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Aug 2025 13:09:56 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 12:34:33 GMT, Quan Anh Mai wrote: >> See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. >> >> Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > src/hotspot/share/opto/compile.cpp line 3663: > >> 3661: n->subsume_by(ConNode::make(t->make_narrowoop()), this); >> 3662: } else if (t->isa_klassptr()) { >> 3663: ciKlass* klass = t->is_klassptr()->exact_klass(); > > This branch means that we are trying to compress a pointer that cannot be compressed. This seems wrong either way. Right! This replaces (`ConP` -> `EncodeP`) -> `ConN`. While I agree that it should theoretically be handled on `EncodeP` path, it seems sane to gate the conversion here as well. Issues like this is why I want to disable/revert the optimization that puts us into this situation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26559#discussion_r2251425489 From duke at openjdk.org Mon Aug 4 13:40:38 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 4 Aug 2025 13:40:38 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v11] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 08:46:54 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > move vredsum_vs out of VEC_LOOP to improve performance The results of [suggested experiment](https://github.com/openjdk/jdk/pull/17413#discussion_r2244179129) with lmul==1: --- -XX:DisableIntrinsic=_vectorizedHashCode --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 10 11.271 ? 0.002 ns/op ArraysHashCode.ints 5 avgt 10 28.811 ? 0.004 ns/op ArraysHashCode.ints 10 avgt 10 40.720 ? 0.022 ns/op ArraysHashCode.ints 20 avgt 10 68.195 ? 0.245 ns/op ArraysHashCode.ints 30 avgt 10 88.203 ? 0.358 ns/op ArraysHashCode.ints 40 avgt 10 115.552 ? 0.513 ns/op ArraysHashCode.ints 50 avgt 10 134.724 ? 0.194 ns/op ArraysHashCode.ints 60 avgt 10 161.800 ? 0.526 ns/op ArraysHashCode.ints 70 avgt 10 171.443 ? 0.407 ns/op ArraysHashCode.ints 80 avgt 10 192.710 ? 0.360 ns/op ArraysHashCode.ints 90 avgt 10 207.956 ? 0.096 ns/op ArraysHashCode.ints 100 avgt 10 231.261 ? 0.338 ns/op ArraysHashCode.ints 200 avgt 10 450.309 ? 1.013 ns/op ArraysHashCode.ints 300 avgt 10 655.367 ? 0.807 ns/op --- -XX:-UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 10 11.277 ? 0.006 ns/op ArraysHashCode.ints 5 avgt 10 24.448 ? 0.014 ns/op ArraysHashCode.ints 10 avgt 10 35.767 ? 0.058 ns/op ArraysHashCode.ints 20 avgt 10 58.871 ? 0.014 ns/op ArraysHashCode.ints 30 avgt 10 82.748 ? 0.403 ns/op ArraysHashCode.ints 40 avgt 10 105.844 ? 0.057 ns/op ArraysHashCode.ints 50 avgt 10 129.691 ? 0.207 ns/op ArraysHashCode.ints 60 avgt 10 152.783 ? 0.029 ns/op ArraysHashCode.ints 70 avgt 10 176.573 ? 0.031 ns/op ArraysHashCode.ints 80 avgt 10 199.825 ? 0.091 ns/op ArraysHashCode.ints 90 avgt 10 223.790 ? 0.757 ns/op ArraysHashCode.ints 100 avgt 10 247.976 ? 0.980 ns/op ArraysHashCode.ints 200 avgt 10 481.633 ? 0.096 ns/op ArraysHashCode.ints 300 avgt 10 716.520 ? 0.218 ns/op --- -XX:+UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 10 11.275 ? 0.003 ns/op ArraysHashCode.ints 5 avgt 10 21.293 ? 0.007 ns/op ArraysHashCode.ints 10 avgt 10 80.183 ? 0.081 ns/op ArraysHashCode.ints 20 avgt 10 92.063 ? 0.032 ns/op ArraysHashCode.ints 30 avgt 10 103.319 ? 0.009 ns/op ArraysHashCode.ints 40 avgt 10 98.937 ? 0.015 ns/op ArraysHashCode.ints 50 avgt 10 120.870 ? 0.042 ns/op ArraysHashCode.ints 60 avgt 10 128.407 ? 0.048 ns/op ArraysHashCode.ints 70 avgt 10 145.908 ? 0.059 ns/op ArraysHashCode.ints 80 avgt 10 134.045 ? 0.043 ns/op ArraysHashCode.ints 90 avgt 10 154.720 ? 0.048 ns/op ArraysHashCode.ints 100 avgt 10 173.479 ? 0.040 ns/op ArraysHashCode.ints 200 avgt 10 261.791 ? 0.100 ns/op ArraysHashCode.ints 300 avgt 10 353.951 ? 0.126 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3150754134 From duke at openjdk.org Mon Aug 4 13:40:38 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 4 Aug 2025 13:40:38 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v10] In-Reply-To: References: Message-ID: <18pF9Lub4DZhUqNaFVrHCVZ2kdocC4M2BYZ0CYVk-kk=.30c273bb-c04f-4427-a34c-1dde86456b3b@github.com> On Mon, 4 Aug 2025 08:43:06 GMT, Yuri Gaevsky wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2062: >> >>> 2060: vmv_s_x(v_powmax, pow31_highest); >>> 2061: >>> 2062: vsetvli(consumed, cnt, Assembler::e32, Assembler::m4); >> >> What's the performance look like with a smaller `lmul` (m1 or m2)? I am asking this because there are hardwares there (like SG2044) with a VLEN of 128 instead of 256 like on K1. > >> What's the performance look like with a smaller `lmul` (m1 or m2)? I am asking this because there are hardwares there (like SG2044) with a VLEN of 128 instead of 256 like on K1. > > Sure, I'll do it, thanks for the suggestion. Please see it [here](https://github.com/openjdk/jdk/pull/17413#issuecomment-3150754134). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2251529819 From duke at openjdk.org Mon Aug 4 13:40:37 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 4 Aug 2025 13:40:37 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v12] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: make an experiment with lmul==1 instead of lmul==4. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/c558db0b..ffaba3d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=10-11 Stats: 10 lines in 2 files changed: 4 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From mhaessig at openjdk.org Mon Aug 4 13:42:54 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 4 Aug 2025 13:42:54 GMT Subject: RFR: 8360561: PhaseIdealLoop::create_new_if_for_predicate hits "must be a uct if pattern" assert In-Reply-To: References: Message-ID: On Mon, 28 Jul 2025 12:31:49 GMT, Marc Chevalier wrote: > Did you know that ranges can be disjoints and yet not ordered?! Well, in modular arithmetic. > > Let's look at a simplistic example: > > int x; > if (?) { > x = -1; > } else { > x = 1; > } > > if (x != 0) { > return; > } > // Unreachable > > > With signed ranges, before the second `if`, `x` is in `[-1, 1]`. Which is enough to enter to second if, but not enough to prove you have to enter it: it wrongly seems that after the second `if` is still reachable. Twaddle! > > With unsigned ranges, at this point `x` is in `[1, 2^32-1]`, and then, it is clear that `x != 0`. This information is used to refine the value of `x` in the (missing) else-branch, and so, after the if. This is done with simple lattice meet (Hotspot's join): in the else-branch, the possible values of `x` are the meet of what is was worth before, and the interval in the guard, that is `[0, 0]`. Thanks to the unsigned range, this is known to be empty (that is bottom, or Hotspot's top). And with a little reduced product, the whole type of `x` is empty as well. Yet, this information is not used to kill control yet. > > This is here the center of the problem: we have a situation such as: > 2 after-CastII > After node `110 CastII` is idealized, it is found to be Top, and then the uncommon trap at `129` is replaced by `238 Halt` by being value-dead. > 1 before-CastII > Since the control is not killed, the node stay there, eventually making some predicate-related assert fail as a trap is expected under a `ParsePredicate`. > > And that's what this change proposes: when comparing integers with non-ordered ranges, let's see if the unsigned ranges overlap, by computing the meet. If the intersection is empty, then the values can't be equals, without being able to order them. This is new! Without unsigned information for signed integer, either they overlap, or we can order them. Adding modular arithmetic allows to have non-overlapping ranges that are also not ordered. > > Let's also notice that 0 is special: it is important bounds are on each side of 0 (or 2^31, the other discontinuity). For instance if `x` can be 1 or 5, for instance, both the signed and unsigned range will agree on `[1, 5]` and not be able to prove it's, let's say, 3. > > What would there be other ways to treat this problem a bit ... Thank you for working on this, @marc-chevalier! And the nice explanation. The changes look good to me. Only the IR test needs a small fix. test/hotspot/jtreg/compiler/igvn/CmpDisjointButNonOrderedRangesLong.java line 42: > 40: > 41: public static void main(String[] strArr) { > 42: TestFramework.runWithFlags("-Xcomp", "-XX:CompileCommand=compileonly,compiler.igvn.CmpDisjointButNonOrderedRangesLong::test"); Supplying `-Xcomp` will skip IR-verification: https://github.com/openjdk/jdk/blob/fc4755535d61c2fd4d9a2c9a673da148f742f035/test/hotspot/jtreg/compiler/lib/ir_framework/README.md?plain=1#L136-L141 You can emulate `-Xcomp` behavior with `@Warmup(0)`. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26504#pullrequestreview-3084270108 PR Review Comment: https://git.openjdk.org/jdk/pull/26504#discussion_r2251533416 From galder at openjdk.org Mon Aug 4 13:54:01 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 4 Aug 2025 13:54:01 GMT Subject: Integrated: 8354244: Use random data in MinMaxRed_Long data arrays In-Reply-To: References: Message-ID: On Thu, 24 Jul 2025 06:45:59 GMT, Galder Zamarre?o wrote: > Simplified the data used in the tests added in [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). The data does not need to have a specific shape because this test focuses on verifying the IR when vectorization kicks in, and when it does, the data can just be random. Shaping the data to control branch taken/not-taken paths makes sense when CMov macro expansion kicks in instead of vectorization. > > When switching to random data I noticed that the test was randomly failing. This was due to potential overflows that result from takin the min/max and then multiplying it by 11, so I've adjusted that section of the test as well. > > I've run the test on both aarch64 and x64 platforms where this test would get vectorized. To verify that I made sure the test passed and verified that the jtr output to make sure the IR conditions were matched. This pull request has now been integrated. Changeset: 567c0c93 Author: Galder Zamarre?o Committer: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/567c0c9335c3ff020871115e5a58f3f40fd4b1ad Stats: 84 lines in 1 file changed: 11 ins; 62 del; 11 mod 8354244: Use random data in MinMaxRed_Long data arrays Reviewed-by: chagedorn, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/26451 From duke at openjdk.org Mon Aug 4 14:45:18 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 4 Aug 2025 14:45:18 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v13] In-Reply-To: References: Message-ID: <8-m2z2l9mzXahUE3FZeXWStvYnPGLb31Je994cIrSnc=.bceb2ea2-948c-4312-8647-8605923c13ba@github.com> > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: fixed error made for prevoius lmul-m1 experiment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/ffaba3d0..bc1290ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=11-12 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Mon Aug 4 15:45:41 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 4 Aug 2025 15:45:41 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v14] In-Reply-To: References: Message-ID: <5cDDfPjeHH84lAgZGBGYmAW5QTspzvgEDVOV_0lGa94=.e64a5cb5-9786-4fd1-930c-2d2148621f58@github.com> > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: returned lmul==m4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/bc1290ee..6c976c0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=12-13 Stats: 5 lines in 2 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From galder at openjdk.org Mon Aug 4 15:50:57 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 4 Aug 2025 15:50:57 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: On Fri, 1 Aug 2025 12:17:03 GMT, Bhavana Kilambi wrote: >> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. >> >> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: >> >> >> Benchmark (seed) (size) Mode Cnt Base Patch Units Diff >> VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% >> VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% >> VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% >> VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% >> VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% >> VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% >> >> >> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. >> >> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. > > src/hotspot/share/opto/vectornode.cpp line 1830: > >> 1828: } >> 1829: >> 1830: bool VectorReinterpretNode::implemented(int opc, uint vlen, BasicType src_type, BasicType dst_type) { > > `opc` is not used in this method. Do we need this parameter here? Yup not needed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2251882431 From jbhateja at openjdk.org Mon Aug 4 15:52:28 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 4 Aug 2025 15:52:28 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions Message-ID: Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour which does not match with already allocated nebouring live ranges. With Intel APX NDD ISA extension several existing two address arithmetic instruction can now have an explicit non-destructive desitination operand, this in general saves addition spills for two address instruciton where destination is also first source operand and whose live range surpasses this instruction. All NDD instructions mandates extended EVEX encoding with a bulky 4 byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 deomotion in the assembler layer but due to existing first color selection register allocation policy the demotions are rare. This patch biases the allocation of NDD definition to first source operand or second source operand for commutative class of operations. Biasing is compile time hint to allocator and is different from live range coalescing (aggressive / conservative) which merge the two live ranges using union find algorithm. Given that REX encoding needs 1 byte prefix and REX2 encoding need 2 byte prefix, domotion saves considerable JIT code size. Patch shows around 5-20% improment in code size by facilitating NDD demotion. For following micro, method JIT code size reduced from 136 to 120 bytes which is around 13% reduction in code size. **Micro:-** image **Baseline :-** image **With opt:-** image Thorough validation are underway using latest [Intel Software Developement Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - Some refactoring - 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions Changes: https://git.openjdk.org/jdk/pull/26283/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351016 Stats: 89 lines in 2 files changed: 72 ins; 6 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From qamai at openjdk.org Mon Aug 4 15:52:28 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 4 Aug 2025 15:52:28 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions In-Reply-To: References: Message-ID: On Mon, 14 Jul 2025 02:36:24 GMT, Jatin Bhateja wrote: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour which does not match with already allocated nebouring live ranges. > > With Intel APX NDD ISA extension several existing two address arithmetic instruction can now have an explicit non-destructive desitination operand, this in general saves addition spills for two address instruciton where destination is also first source operand and whose live range surpasses this instruction. > > All NDD instructions mandates extended EVEX encoding with a bulky 4 byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 deomotion in the assembler layer but due to existing first color selection register allocation policy the demotions are rare. This patch biases the allocation of NDD definition to first source operand or second source operand for commutative class of operations. > > Biasing is compile time hint to allocator and is different from live range coalescing (aggressive / conservative) which merge the two live ranges using union find algorithm. Given that REX encoding needs 1 byte prefix and REX2 encoding need 2 byte prefix, domotion saves considerable JIT code size. > > Patch shows around 5-20% improment in code size by facilitating NDD demotion. > > For following micro, method JIT code size reduced from 136 to 120 bytes which is around 13% reduction in code size. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validation are underway using latest [Intel Software Developement Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin This may also be applicable to non-APX instructions. For example, in the case of casting long to int, if the destination and the source are the same, then we do not need to emit any code. As a result, do you think it is better to mark operands in the ad file to preferably have the same register as the result? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3069328067 From jbhateja at openjdk.org Mon Aug 4 15:52:29 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 4 Aug 2025 15:52:29 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions In-Reply-To: References: Message-ID: On Mon, 14 Jul 2025 12:35:19 GMT, Quan Anh Mai wrote: > This may also be applicable to non-APX instructions. For example, in the case of casting long to int, if the destination and the source are the same, then we do not need to emit any code. As a result, do you think it is better to mark operands in the ad file to preferably have the same register as the result? Yes, for now, I limited this to APX, but biasing the allocation of destination to non-interfering use(src) will enable instruction elision during assembling. Currently, while assigning a color(reg) to a live range, the allocator picks the first free aligned register. > do you think it is better to mark operands in the ad file to preferably have the same register as the result? There are existing DF attributions like USE_DEF which can be used to add such a constraint, but for APX NDD, we do not wish to up-front constrain the source to be the same as the destination, as it defeats the purpose, and the allocator may end up emitting a copy before the NDD instruction to honour this constraint. The idea here is to only bias color selection for non-interfering live ranges to facilitate EEVEX to REX/REX2 demotions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3070203095 From qamai at openjdk.org Mon Aug 4 15:54:58 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 4 Aug 2025 15:54:58 GMT Subject: RFR: 8360561: PhaseIdealLoop::create_new_if_for_predicate hits "must be a uct if pattern" assert In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 13:39:00 GMT, Manuel H?ssig wrote: >> Did you know that ranges can be disjoints and yet not ordered?! Well, in modular arithmetic. >> >> Let's look at a simplistic example: >> >> int x; >> if (?) { >> x = -1; >> } else { >> x = 1; >> } >> >> if (x != 0) { >> return; >> } >> // Unreachable >> >> >> With signed ranges, before the second `if`, `x` is in `[-1, 1]`. Which is enough to enter to second if, but not enough to prove you have to enter it: it wrongly seems that after the second `if` is still reachable. Twaddle! >> >> With unsigned ranges, at this point `x` is in `[1, 2^32-1]`, and then, it is clear that `x != 0`. This information is used to refine the value of `x` in the (missing) else-branch, and so, after the if. This is done with simple lattice meet (Hotspot's join): in the else-branch, the possible values of `x` are the meet of what is was worth before, and the interval in the guard, that is `[0, 0]`. Thanks to the unsigned range, this is known to be empty (that is bottom, or Hotspot's top). And with a little reduced product, the whole type of `x` is empty as well. Yet, this information is not used to kill control yet. >> >> This is here the center of the problem: we have a situation such as: >> 2 after-CastII >> After node `110 CastII` is idealized, it is found to be Top, and then the uncommon trap at `129` is replaced by `238 Halt` by being value-dead. >> 1 before-CastII >> Since the control is not killed, the node stay there, eventually making some predicate-related assert fail as a trap is expected under a `ParsePredicate`. >> >> And that's what this change proposes: when comparing integers with non-ordered ranges, let's see if the unsigned ranges overlap, by computing the meet. If the intersection is empty, then the values can't be equals, without being able to order them. This is new! Without unsigned information for signed integer, either they overlap, or we can order them. Adding modular arithmetic allows to have non-overlapping ranges that are also not ordered. >> >> Let's also notice that 0 is special: it is important bounds are on each side of 0 (or 2^31, the other discontinuity). For instance if `x` can be 1 or 5, for instance, both the signed and unsigned range will agree on `[1, 5]` and not be able to prove it's, let's say, 3. > ... > > test/hotspot/jtreg/compiler/igvn/CmpDisjointButNonOrderedRangesLong.java line 42: > >> 40: >> 41: public static void main(String[] strArr) { >> 42: TestFramework.runWithFlags("-Xcomp", "-XX:CompileCommand=compileonly,compiler.igvn.CmpDisjointButNonOrderedRangesLong::test"); > > Supplying `-Xcomp` will skip IR-verification: > > https://github.com/openjdk/jdk/blob/fc4755535d61c2fd4d9a2c9a673da148f742f035/test/hotspot/jtreg/compiler/lib/ir_framework/README.md?plain=1#L136-L141 > > You can emulate `-Xcomp` behavior with `@Warmup(0)`. No, only supplying `Xcomp` to the parent process (the one running the `main`) disables IR verification. You can supply whatever flag to the child process and the IR verification still applies. You can see this in all Valhalla tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26504#discussion_r2251889649 From galder at openjdk.org Mon Aug 4 15:55:56 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 4 Aug 2025 15:55:56 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: <87722bascgvRzxbCm2Npiamp8TmwaGlqCB7rfkdNNFY=.6cb35d1f-2dec-4340-84e5-92ebd3d81921@github.com> On Fri, 1 Aug 2025 12:52:21 GMT, Bhavana Kilambi wrote: >> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. >> >> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: >> >> >> Benchmark (seed) (size) Mode Cnt Base Patch Units Diff >> VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% >> VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% >> VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% >> VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% >> VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% >> VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% >> >> >> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. >> >> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. > > src/hotspot/share/opto/vectornode.cpp line 1831: > >> 1829: >> 1830: bool VectorReinterpretNode::implemented(int opc, uint vlen, BasicType src_type, BasicType dst_type) { >> 1831: if ((src_type == T_FLOAT && dst_type == T_INT) || > > Just a suggestion, do you feel a `switch-case` could be more readable/clear in this case? Something like this - > > > bool VectorReinterpretNode::implemented(uint vlen, BasicType src_type, BasicType dst_type) { > switch (src_type) { > case T_FLOAT: > if (dst_type != T_INT) return false; > break; > case T_INT: > if (dst_type != T_FLOAT) return false; > break; > case T_DOUBLE: > if (dst_type != T_LONG) return false; > break; > case T_LONG: > if (dst_type != T_DOUBLE) return false; > break; > default: > return false; > } > return Matcher::match_rule_supported_auto_vectorization(Op_VectorReinterpret, vlen, dst_type); > } Both options look just fine to me, but I'm happy to re-write it like that if others also feel the same way. > test/micro/org/openjdk/bench/java/lang/VectorBitConversion.java line 67: > >> 65: >> 66: @Benchmark >> 67: public long[] doubleToLongBits() { > > Would something like this be more concise (and maybe more readable as well) - > > @Benchmark > public long[] doubleToLongBits() { > for (int i = 0; i < doubles.length; i++) { > resultLongs[i] = Double.doubleToLongBits(doubles[i]); > } > return resultLongs; > } > > > The loop should still get vectorized (if vectorizable). > > Same for other benchmarks. Maybe but there's a reason why I wrote these benchmark methods this way. Keeping each line doing one thing makes it easier to map each line to the assembly (e.g. `perfasm`) and related IR nodes (e.g. `PrintIdeal`). That IMO is more important than the conciseness of the benchmark. What do others think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2251893141 PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2251889884 From mdoerr at openjdk.org Mon Aug 4 15:56:02 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 4 Aug 2025 15:56:02 GMT Subject: RFR: 8364580: Test compiler/vectorization/TestSubwordTruncation.java fails on platforms without RoundF/RoundD In-Reply-To: <7xOctkQXgAftXEeP6TwNSfcc76_oLPnMMv0lS8nCwr8=.7a05fd35-b041-409e-bfb8-9798aa94acce@github.com> References: <7xOctkQXgAftXEeP6TwNSfcc76_oLPnMMv0lS8nCwr8=.7a05fd35-b041-409e-bfb8-9798aa94acce@github.com> Message-ID: On Mon, 4 Aug 2025 02:54:53 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a quick patch to fix the test bug where TestSubwordTruncation fails on platforms that don't implement RoundF and RoundD. Thanks! Thanks for fixing it so quickly! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26611#issuecomment-3151323759 From jkarthikeyan at openjdk.org Mon Aug 4 16:54:09 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Aug 2025 16:54:09 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v5] In-Reply-To: References: Message-ID: On Wed, 23 Jul 2025 09:31:18 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into enhance-clz-type > - Move `TestCountBitsRange` to `compiler.c2.gvn` > - Fix null checks > - Narrow type bound > - Use `BitsPerX` constant instead of `sizeof` > - Make the type of count leading/trailing zero nodes more precise This is nice! I just have a comment on the unit test. test/hotspot/jtreg/compiler/c2/gvn/TestCountBitsRange.java line 43: > 41: static int i = RunInfo.getRandom().nextInt(); > 42: static long l = RunInfo.getRandom().nextLong(); > 43: It would be nice to also check the return values of the functions with a non-compiled version, so that we can make sure that the constant folding results are correct as well. ------------- PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3085007943 PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2252040764 From jkarthikeyan at openjdk.org Mon Aug 4 16:54:10 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Aug 2025 16:54:10 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: Message-ID: On Tue, 24 Jun 2025 09:51:51 GMT, Qizheng Xing wrote: >> src/hotspot/share/opto/countbitsnode.cpp line 61: >> >>> 59: ti->_widen); >>> 60: } >>> 61: return TypeInt::INT; >> >> Just curious, when would this fallback path be used? > > When someone passes a non-integer to `CountLeadingZerosINode`, I think. Since the function filters `Type::TOP` earlier, I don't think it is possible to see non-int types here. I think it would be better to change it to `is_int()` and remove the null check, so that any broken graph constructions can be caught with the assert on the type check. You might also need to check for `Type::BOTTOM`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2252036736 From shade at openjdk.org Mon Aug 4 19:35:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Aug 2025 19:35:27 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option [v3] In-Reply-To: References: Message-ID: <-TUkq_CGzcnVj0Rx_mKGRjXQz0ibAQ_tcfq1og_nt2U=.6dd5fc0e-4b58-48c6-81ff-b35378e09d80@github.com> On Wed, 28 May 2025 18:39:27 GMT, Zdenek Zambersky wrote: >> This change adds ` -XX:-IgnoreUnrecognizedVMOptions` to problematic tests (or `@requires vm.compiler2.enabled` in one case), to prevent failures `Unrecognized VM option` on client VM. > > Zdenek Zambersky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > Fix of compiler tests for client VM This fell through the cracks, I think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-3152089180 From bkilambi at openjdk.org Mon Aug 4 20:10:09 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 4 Aug 2025 20:10:09 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F In-Reply-To: <87722bascgvRzxbCm2Npiamp8TmwaGlqCB7rfkdNNFY=.6cb35d1f-2dec-4340-84e5-92ebd3d81921@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> <87722bascgvRzxbCm2Npiamp8TmwaGlqCB7rfkdNNFY=.6cb35d1f-2dec-4340-84e5-92ebd3d81921@github.com> Message-ID: On Mon, 4 Aug 2025 15:51:59 GMT, Galder Zamarre?o wrote: >> test/micro/org/openjdk/bench/java/lang/VectorBitConversion.java line 67: >> >>> 65: >>> 66: @Benchmark >>> 67: public long[] doubleToLongBits() { >> >> Would something like this be more concise (and maybe more readable as well) - >> >> @Benchmark >> public long[] doubleToLongBits() { >> for (int i = 0; i < doubles.length; i++) { >> resultLongs[i] = Double.doubleToLongBits(doubles[i]); >> } >> return resultLongs; >> } >> >> >> The loop should still get vectorized (if vectorizable). >> >> Same for other benchmarks. > > Maybe but there's a reason why I wrote these benchmark methods this way. Keeping each line doing one thing makes it easier to map each line to the assembly (e.g. `perfasm`) and related IR nodes (e.g. `PrintIdeal`). That IMO is more important than the conciseness of the benchmark. What do others think? Makes sense. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2252475557 From missa at openjdk.org Mon Aug 4 20:36:57 2025 From: missa at openjdk.org (Mohamed Issa) Date: Mon, 4 Aug 2025 20:36:57 GMT Subject: RFR: 8364666: Tier1 builds broken by JDK-8360559 Message-ID: This change corrects the stub id type declaration for x86_64 sinh that wasn't properly matched with the other intrinsic udpates. ------------- Commit messages: - Fix stub id type in sinh x86_64 generator Changes: https://git.openjdk.org/jdk/pull/26629/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26629&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8364666 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26629.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26629/head:pull/26629 PR: https://git.openjdk.org/jdk/pull/26629 From sviswanathan at openjdk.org Mon Aug 4 20:50:14 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 4 Aug 2025 20:50:14 GMT Subject: RFR: 8364666: Tier1 builds broken by JDK-8360559 In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 20:29:24 GMT, Mohamed Issa wrote: > This change corrects the stub id type declaration for x86_64 sinh that wasn't properly matched with the other intrinsic udpates. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26629#pullrequestreview-3085708209 From dholmes at openjdk.org Mon Aug 4 20:50:15 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 4 Aug 2025 20:50:15 GMT Subject: RFR: 8364666: Tier1 builds broken by JDK-8360559 In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 20:29:24 GMT, Mohamed Issa wrote: > This change corrects the stub id type declaration for x86_64 sinh that wasn't properly matched with the other intrinsic udpates. @missa-prime the issue [JDK-8364666](https://bugs.openjdk.org/browse/JDK-8364666): is being used to backout[1] your original change due to the build breakage. A redo issue should be created and you can then apply this fix there. The fact your change does not even build indicates it also cannot have been tested and so is not ready to be integrated. [1] https://openjdk.org/guide/#backing-out-a-change ------------- PR Comment: https://git.openjdk.org/jdk/pull/26629#issuecomment-3152365091 From dcubed at openjdk.org Mon Aug 4 20:57:03 2025 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 4 Aug 2025 20:57:03 GMT Subject: RFR: 8364666: Tier1 builds broken by JDK-8360559 In-Reply-To: References: Message-ID: <_akJJNBulZdup40h_S2XGCWInVDeycC6gXks_Ws1N2E=.4e8a1aa7-302f-4b04-8460-f32dc6140368@github.com> On Mon, 4 Aug 2025 20:29:24 GMT, Mohamed Issa wrote: > This change corrects the stub id type declaration for x86_64 sinh that wasn't properly matched with the other intrinsic udpates. linux-x64-debug-nopch has PASSED with @missa-prime's fix, but obviously there are many more build to go... According to the PR for the original fix: https://github.com/openjdk/jdk/pull/26152 That PR passed X64 builds in GHA. So what happened here? Was a change made after the original testing passed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26629#issuecomment-3152382459 From sviswanathan at openjdk.org Mon Aug 4 21:04:13 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 4 Aug 2025 21:04:13 GMT Subject: RFR: 8364666: Tier1 builds broken by JDK-8360559 In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 20:29:24 GMT, Mohamed Issa wrote: > This change corrects the stub id type declaration for x86_64 sinh that wasn't properly matched with the other intrinsic udpates. > linux-x64-debug-nopch has PASSED with @missa-prime's fix, but obviously there are many more build to go... > > According to the PR for the original fix: #26152 That PR passed X64 builds in GHA. So what happened here? Was a change made after the original testing passed? No, the type was changed on the tip due to another PR (JDK-8360707). Looks like that is what caused the issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26629#issuecomment-3152388779 From dcubed at openjdk.org Mon Aug 4 21:04:14 2025 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 4 Aug 2025 21:04:14 GMT Subject: RFR: 8364666: Tier1 builds broken by JDK-8360559 In-Reply-To: References: Message-ID: <2pDA20swxr55Ikmq7sNuKq8buDgDH9HO7I8e7Lz8FcY=.a0f1b207-e5f1-4224-bad1-91c33947ffc0@github.com> On Mon, 4 Aug 2025 20:57:36 GMT, Sandhya Viswanathan wrote: >> This change corrects the stub id type declaration for x86_64 sinh that wasn't properly matched with the other intrinsic udpates. > >> linux-x64-debug-nopch has PASSED with @missa-prime's fix, but obviously there are many more build to go... >> >> According to the PR for the original fix: #26152 That PR passed X64 builds in GHA. So what happened here? Was a change made after the original testing passed? > > No, the type was changed on the tip due to another PR (JDK-8360707). Looks like that is what caused the issue. @sviswa7 so what we have here is an indirect merge collision that caused a build breakage? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26629#issuecomment-3152393362 From sviswanathan at openjdk.org Mon Aug 4 21:04:14 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 4 Aug 2025 21:04:14 GMT Subject: RFR: 8364666: Tier1 builds broken by JDK-8360559 In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 20:57:36 GMT, Sandhya Viswanathan wrote: >> This change corrects the stub id type declaration for x86_64 sinh that wasn't properly matched with the other intrinsic udpates. > >> linux-x64-debug-nopch has PASSED with @missa-prime's fix, but obviously there are many more build to go... >> >> According to the PR for the original fix: #26152 That PR passed X64 builds in GHA. So what happened here? Was a change made after the original testing passed? > > No, the type was changed on the tip due to another PR (JDK-8360707). Looks like that is what caused the issue. > @sviswa7 so what we have here is an indirect merge collision that caused a build breakage? Yes, unfortunately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26629#issuecomment-3152394729 From dcubed at openjdk.org Mon Aug 4 21:13:02 2025 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 4 Aug 2025 21:13:02 GMT Subject: RFR: 8364666: Tier1 builds broken by JDK-8360559 In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 20:29:24 GMT, Mohamed Issa wrote: > This change corrects the stub id type declaration for x86_64 sinh that wasn't properly matched with the other intrinsic udpates. @missa-prime - can you please do an "/integrate delegate" so that other folks can integrate your fix if that's the way we decide to go (and all the builds pass)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26629#issuecomment-3152416385 From dlong at openjdk.org Mon Aug 4 21:22:57 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 4 Aug 2025 21:22:57 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v4] In-Reply-To: References: Message-ID: > This PR removes the recently added lock around set_guard_value, using instead Atomic::cmpxchg to atomically update bit-fields of the guard value. Further, it takes a fast-path that uses the previous direct store when at a safepoint. Combined, these changes should get us back to almost where we were before in terms of overhead. If necessary, we could go even further and allow make_not_entrant() to perform a direct byte store, leaving 24 bits for the guard value. Dean Long has updated the pull request incrementally with one additional commit since the last revision: skip icache flush if nothing changed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26399/files - new: https://git.openjdk.org/jdk/pull/26399/files/a06b3446..840750ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26399&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26399&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26399.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26399/head:pull/26399 PR: https://git.openjdk.org/jdk/pull/26399 From dcubed at openjdk.org Mon Aug 4 21:23:07 2025 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 4 Aug 2025 21:23:07 GMT Subject: RFR: 8364666: Tier1 builds broken by JDK-8360559 In-Reply-To: References: Message-ID: <7RQGrtqkqcfSiT8Uin5nBlEe1nj6dHNa76obcf5JEDA=.c48c1e27-084c-4a38-b406-026f438c8402@github.com> On Mon, 4 Aug 2025 20:29:24 GMT, Mohamed Issa wrote: > This change corrects the stub id type declaration for x86_64 sinh that wasn't properly matched with the other intrinsic udpates. @missa-prime - Thanks for delegation. One windows-x64 build is still running. There's linux-x64 docs build still scheduled, but I don't think that one was affected. The Linux doc build did not run in the original failing Mach5 job sets due to a failed dependency so I should probably wait for that one just to be sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26629#issuecomment-3152436990 PR Comment: https://git.openjdk.org/jdk/pull/26629#issuecomment-3152440517 From dlong at openjdk.org Mon Aug 4 21:26:22 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 4 Aug 2025 21:26:22 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v5] In-Reply-To: References: Message-ID: > This PR removes the recently added lock around set_guard_value, using instead Atomic::cmpxchg to atomically update bit-fields of the guard value. Further, it takes a fast-path that uses the previous direct store when at a safepoint. Combined, these changes should get us back to almost where we were before in terms of overhead. If necessary, we could go even further and allow make_not_entrant() to perform a direct byte store, leaving 24 bits for the guard value. Dean Long has updated the pull request incrementally with one additional commit since the last revision: one unconditional release should be enough ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26399/files - new: https://git.openjdk.org/jdk/pull/26399/files/840750ab..d9e93db3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26399&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26399&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26399.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26399/head:pull/26399 PR: https://git.openjdk.org/jdk/pull/26399 From dlong at openjdk.org Mon Aug 4 21:26:23 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 4 Aug 2025 21:26:23 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v3] In-Reply-To: References: <_8Z-PXaFqGay1pHqcJmeWXrOFv4QQVqnJG2RuZ7rzTk=.34cc6ecb-e189-461c-971b-f59f899372f5@github.com> Message-ID: <31qxrkcScTxkk9gjEGGuEZEyCnpLe9VapvbgIpOTpow=.5a8777ab-35bf-48e2-8900-3d6ed5c19cf7@github.com> On Fri, 1 Aug 2025 21:38:11 GMT, Martin Doerr wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix PPC64 > > src/hotspot/cpu/ppc/gc/shared/barrierSetNMethod_ppc.cpp line 84: > >> 82: nativeMovRegMem_at(new_mov_instr.buf)->set_offset(new_value, false /* no icache flush */); >> 83: // Swap in the new value >> 84: uint64_t v = Atomic::cmpxchg(instr, old_mov_instr.u64, new_mov_instr.u64, memory_order_release); > > We have `OrderAccess::release()` above, so `memory_order_release` looks redundant. Shouldn't we use `memory_order_relaxed`, here? I think you are right. But your question about release is making me wonder if we need acquire as well. For example if two threads are racing to disarm, is there a memory visibility problem if we do not use acquire for the CAS, or if we did the release only on a successful CAS? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26399#discussion_r2252604613 From dcubed at openjdk.org Mon Aug 4 21:34:08 2025 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 4 Aug 2025 21:34:08 GMT Subject: RFR: 8364666: Tier1 builds broken by JDK-8360559 In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 20:29:24 GMT, Mohamed Issa wrote: > This change corrects the stub id type declaration for x86_64 sinh that wasn't properly matched with the other intrinsic udpates. It's taking forever for the linux-x64 build task to get scheduled. I'm going to take a risk and go ahead and integrate this one-line fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26629#issuecomment-3152459674 From missa at openjdk.org Mon Aug 4 21:34:09 2025 From: missa at openjdk.org (Mohamed Issa) Date: Mon, 4 Aug 2025 21:34:09 GMT Subject: Integrated: 8364666: Tier1 builds broken by JDK-8360559 In-Reply-To: References: Message-ID: <7rfQtVzDg2zCN4QxQu7ZdJKaKbmVjUdnIwbJq4MQjZQ=.a7e6ef9e-9562-4410-b0d3-d49f0eb8c6f1@github.com> On Mon, 4 Aug 2025 20:29:24 GMT, Mohamed Issa wrote: > This change corrects the stub id type declaration for x86_64 sinh that wasn't properly matched with the other intrinsic udpates. This pull request has now been integrated. Changeset: f96b6bcd Author: Mohamed Issa Committer: Daniel D. Daugherty URL: https://git.openjdk.org/jdk/commit/f96b6bcd4ddbb1d0e0a76d9f4e3b43bec20dcb7a Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8364666: Tier1 builds broken by JDK-8360559 Reviewed-by: sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/26629 From mdoerr at openjdk.org Mon Aug 4 21:37:03 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 4 Aug 2025 21:37:03 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 13:04:23 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/opto/compile.cpp line 3663: >> >>> 3661: n->subsume_by(ConNode::make(t->make_narrowoop()), this); >>> 3662: } else if (t->isa_klassptr()) { >>> 3663: ciKlass* klass = t->is_klassptr()->exact_klass(); >> >> This branch means that we are trying to compress a pointer that cannot be compressed. This seems wrong either way. > > Right! This replaces (`ConP` -> `EncodeP`) -> `ConN`. While I agree that it should theoretically be handled on `EncodeP` path, it seems sane to gate the conversion here as well. Issues like this is why I want to disable/revert the optimization that puts us into this situation. I think it's always bad if we reach here with an abstract or interface class. `ConP` + `EncodeP` will still be wrong. Usage of the result can cause strange issues. Can we assert that `klass->is_in_encoding_range()`, here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26559#discussion_r2252624471 From mdoerr at openjdk.org Mon Aug 4 21:50:09 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 4 Aug 2025 21:50:09 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v3] In-Reply-To: <31qxrkcScTxkk9gjEGGuEZEyCnpLe9VapvbgIpOTpow=.5a8777ab-35bf-48e2-8900-3d6ed5c19cf7@github.com> References: <_8Z-PXaFqGay1pHqcJmeWXrOFv4QQVqnJG2RuZ7rzTk=.34cc6ecb-e189-461c-971b-f59f899372f5@github.com> <31qxrkcScTxkk9gjEGGuEZEyCnpLe9VapvbgIpOTpow=.5a8777ab-35bf-48e2-8900-3d6ed5c19cf7@github.com> Message-ID: On Mon, 4 Aug 2025 21:22:12 GMT, Dean Long wrote: >> src/hotspot/cpu/ppc/gc/shared/barrierSetNMethod_ppc.cpp line 84: >> >>> 82: nativeMovRegMem_at(new_mov_instr.buf)->set_offset(new_value, false /* no icache flush */); >>> 83: // Swap in the new value >>> 84: uint64_t v = Atomic::cmpxchg(instr, old_mov_instr.u64, new_mov_instr.u64, memory_order_release); >> >> We have `OrderAccess::release()` above, so `memory_order_release` looks redundant. Shouldn't we use `memory_order_relaxed`, here? > > I think you are right. But your question about release is making me wonder if we need acquire as well. For example if two threads are racing to disarm, is there a memory visibility problem if we do not use acquire for the CAS, or if we do the release only on a successful CAS on the other platforms. Correct. The acquire barrier is at the end of the nmethod entry barrier: https://github.com/openjdk/jdk/blob/f96b6bcd4ddbb1d0e0a76d9f4e3b43bec20dcb7a/src/hotspot/cpu/ppc/gc/shared/barrierSetAssembler_ppc.cpp#L203 It's not needed if we use a GC with `stw_instruction_and_data_patch`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26399#discussion_r2252641681 From thartmann at openjdk.org Mon Aug 4 22:10:02 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 4 Aug 2025 22:10:02 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 16:20:43 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. > > Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. > > Additional testing: > - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > Friendly ping :) Sorry for the delay. Public holiday, traveling and JVMLS now. > [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for ConNKlass, which somehow does not trigger That's probably because the ConNKlass is not added on the `nstack` and therefore not visited by the final graph reshaping code anymore. ------------- PR Review: https://git.openjdk.org/jdk/pull/26559#pullrequestreview-3085897801 From thartmann at openjdk.org Mon Aug 4 22:10:03 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 4 Aug 2025 22:10:03 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 12:34:33 GMT, Quan Anh Mai wrote: >> See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. >> >> Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > src/hotspot/share/opto/compile.cpp line 3663: > >> 3661: n->subsume_by(ConNode::make(t->make_narrowoop()), this); >> 3662: } else if (t->isa_klassptr()) { >> 3663: ciKlass* klass = t->is_klassptr()->exact_klass(); > > This branch means that we are trying to compress a pointer that cannot be compressed. This seems wrong either way. I agree with @merykitty and @TheRealMDoerr. The code is wrong before and after. We should add an assert + compilation bailout in product or a guarantee. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26559#discussion_r2252668929 From thartmann at openjdk.org Mon Aug 4 22:40:08 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 4 Aug 2025 22:40:08 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 16:20:43 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. > > Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. > > Additional testing: > - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` I just discussed this with @coleenp and as I understand the plan is to revert [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526) in JDK 26 (and probably backport to JDK 25u). Therefore, I'd say this should be closed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26559#issuecomment-3152633221 From dlong at openjdk.org Tue Aug 5 04:00:09 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 5 Aug 2025 04:00:09 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v2] In-Reply-To: <_Ye19u_7PlqlsoRSuR0dNeAGbeuHyN_oqD1ZS4q9Nvk=.b94fd29d-d43e-4561-9926-7f5a46434d8e@github.com> References: <_Ye19u_7PlqlsoRSuR0dNeAGbeuHyN_oqD1ZS4q9Nvk=.b94fd29d-d43e-4561-9926-7f5a46434d8e@github.com> Message-ID: On Tue, 1 Jul 2025 09:11:32 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. >> >> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. >> >> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. >> >> Testing: >> - [x] Github Actions >> - [x] tier1, tier2 on all platforms >> - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug >> - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8308094-timeout > - Fix SIGALRM test > - Add timeout functionality to compiler threads src/hotspot/share/compiler/compileBroker.cpp line 236: > 234: { > 235: MutexLocker notifier(thread, CompileTaskWait_lock); > 236: thread->timeout_disarm(); Is holding the lock above important for disasming? If not, can we move the disarms from the if/else branches and do it unconditionally before the if? src/hotspot/share/compiler/compilerThread.cpp line 97: > 95: switch (signo) { > 96: case TIMEOUT_SIGNAL: { > 97: assert(!Atomic::load_acquire(&_timeout_armed), "compile task timed out"); Why do we need acquire? Only the current thread is ever going to be looking at this value, right? src/hotspot/share/compiler/compilerThread.cpp line 157: > 155: // Start the timer. > 156: timer_settime(_timeout_timer, 0, &its, nullptr); > 157: Atomic::release_store(&_timeout_armed, (bool) true); Same questions about release. Are other threads reading/writing this value? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2253044899 PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2253045931 PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2253046604 From dlong at openjdk.org Tue Aug 5 04:07:04 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 5 Aug 2025 04:07:04 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v2] In-Reply-To: <_Ye19u_7PlqlsoRSuR0dNeAGbeuHyN_oqD1ZS4q9Nvk=.b94fd29d-d43e-4561-9926-7f5a46434d8e@github.com> References: <_Ye19u_7PlqlsoRSuR0dNeAGbeuHyN_oqD1ZS4q9Nvk=.b94fd29d-d43e-4561-9926-7f5a46434d8e@github.com> Message-ID: On Tue, 1 Jul 2025 09:11:32 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. >> >> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. >> >> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. >> >> Testing: >> - [x] Github Actions >> - [x] tier1, tier2 on all platforms >> - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug >> - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8308094-timeout > - Fix SIGALRM test > - Add timeout functionality to compiler threads This looks correct, but would it be possible to move the Linux-specific code out of src/hotspot/share? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26023#issuecomment-3153200663 From galder at openjdk.org Tue Aug 5 06:08:05 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 5 Aug 2025 06:08:05 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F In-Reply-To: <7UqSdBPWH0SbdkhAUvF_qM10rK0oFsJXhUKWA3VlL14=.0c35e297-7276-468b-98c6-046e84897625@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> <7UqSdBPWH0SbdkhAUvF_qM10rK0oFsJXhUKWA3VlL14=.0c35e297-7276-468b-98c6-046e84897625@github.com> Message-ID: On Fri, 1 Aug 2025 11:58:03 GMT, Quan Anh Mai wrote: > VectorNode::is_reinterpret_opcode returns true for Op_ReinterpretHF2S and Op_ReinterpretS2HF, which are very similar to the nodes in this PR, can you add these nodes to that method instead? You're suggesting to modify `is_reinterpret_opcode` to be like this, and call that instead of `is_move_opcode`, right? bool VectorNode::is_reinterpret_opcode(int opc) { switch (opc) { case Op_ReinterpretHF2S: case Op_ReinterpretS2HF: case Op_MoveF2I: case Op_MoveD2L: case Op_MoveL2D: case Op_MoveI2F: return true; default: return false; } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3153512422 From galder at openjdk.org Tue Aug 5 06:08:06 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 5 Aug 2025 06:08:06 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: On Fri, 1 Aug 2025 12:46:01 GMT, Bhavana Kilambi wrote: > Although this is not in the scope of this patch, but I wonder if we could rename `ReinterpretS2HF` and `ReinterpretHF2S` to `MoveHF2S` and `MoveS2HF` to keep naming consistent with other types? WDYT @jatin-bhateja That sounds reasonable to me ------------- PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3153514145 From chagedorn at openjdk.org Tue Aug 5 06:25:04 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 5 Aug 2025 06:25:04 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v3] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 12:44:21 GMT, Saranya Natarajan wrote: >> The existing test (`compiler/debug/TestStress.java`) verifies that compiler stress options produce consistent traces when using the same seed. However, there is currently no test to ensure that different seeds result in different traces. >> >> ### Solution >> Added a test case to assess the distinctness of traces generated from different seeds. This fix addresses the fragility concern highlighted in [JDK-8325482](https://bugs.openjdk.org/browse/JDK-8325482) by verifying that traces produced using N (in this case 10) distinct seeds are all not identical. >> >> ### Changes to `compiler/debug/TestStress.java` >> While investigating this issue, I observed that in `compiler/debug/TestStress.java`, the stress options for macro expansion and macro elimination were not being triggered because there were fewer than 2 macro nodes. Note that the `shuffle_macro_nodes()` in` compile.cpp` is only meaningful when there are more than two macro nodes. The generated traces for macro expansion and macro elimination in `TestStress.java` were empty. I have proposed changes to address this problem. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments on camelCase Thanks for the update! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26554#pullrequestreview-3086763158 From dholmes at openjdk.org Tue Aug 5 06:32:12 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 5 Aug 2025 06:32:12 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v2] In-Reply-To: References: Message-ID: <6GG7DTClrrPBBfGhO0OFwolXrssuVPjxbK8KkJ7uafk=.382a9e23-88b7-4fae-9e5e-00c88d309af8@github.com> On Mon, 4 Aug 2025 12:52:39 GMT, Fredrik Bredberg wrote: >> Since the integration of [JDK-8359437](https://bugs.openjdk.org/browse/JDK-8359437) the `LockingMode` flag can no longer be set by the user, instead it's declared as `const int LockingMode = LM_LIGHTWEIGHT;`. This means that we can now safely remove all `LockingMode` related code from all platforms. >> >> This PR removes `LockingMode` related code from the **x86** platform. >> >> When all the `LockingMode` code has been removed from all platforms, we can go on and remove it from shared (non-platform specific) files as well. And finally remove the `LockingMode` variable itself. >> >> Passes tier1-tier5 with no added problems. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update one after review Looks great! I ssuspect there may be more cleanup possible down the track but for now (with whitespace disabled) this PR clearly shows the eradication of the LockingMode. Some minor nits in pre-existing code, and a couple of queries. Thanks src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 392: > 390: // Some commentary on balanced locking: > 391: // > 392: // fast_lock and fast_unlock are emitted only for provably balanced lock sites. I assume this is also correct for `lightweight_lock` and `lightweight_unlock`? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 397: > 395: // The interpreter provides two properties: > 396: // I1: At return-time the interpreter automatically and quietly unlocks any > 397: // objects acquired the current activation (frame). Recall that the Suggestion: // objects acquired in the current activation (frame). Recall that the src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 401: > 399: // a frame. > 400: // I2: If a method attempts to unlock an object that is not held by the > 401: // the frame the interpreter throws IMSX. Suggestion: // frame the interpreter throws IMSX. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 409: > 407: // > 408: // The only other source of unbalanced locking would be JNI. The "Java Native Interface: > 409: // Programmer's Guide and Specification" claims that an object locked by jni_monitorenter Suggestion: // The only other source of unbalanced locking would be JNI. The "Java Native Interface // Specification" states that an object locked by JNI's_MonitorEnter src/hotspot/cpu/x86/interp_masm_x86.cpp line 1034: > 1032: > 1033: // Load object pointer into obj_reg > 1034: movptr(obj_reg, Address(lock_reg, BasicObjectLock::obj_offset())); Do you not still need the `in_bytes()` around `obj_offset()`? ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26552#pullrequestreview-3086744966 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2253257552 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2253258321 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2253259156 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2253263689 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2253276071 From qamai at openjdk.org Tue Aug 5 06:35:03 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 5 Aug 2025 06:35:03 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> <7UqSdBPWH0SbdkhAUvF_qM10rK0oFsJXhUKWA3VlL14=.0c35e297-7276-468b-98c6-046e84897625@github.com> Message-ID: On Tue, 5 Aug 2025 06:05:22 GMT, Galder Zamarre?o wrote: > You're suggesting to modify `is_reinterpret_opcode` to be like this, and call that instead of `is_move_opcode`, right? Yes, that's right. I believe `VectorReinterpret` should be implemented for all pairs of vector species where both the input and output species are implemented. So, `VectorReinterpretNode::implemented` is unnecessary. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3153680952 From dfenacci at openjdk.org Tue Aug 5 08:10:09 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 5 Aug 2025 08:10:09 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v3] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 12:44:21 GMT, Saranya Natarajan wrote: >> The existing test (`compiler/debug/TestStress.java`) verifies that compiler stress options produce consistent traces when using the same seed. However, there is currently no test to ensure that different seeds result in different traces. >> >> ### Solution >> Added a test case to assess the distinctness of traces generated from different seeds. This fix addresses the fragility concern highlighted in [JDK-8325482](https://bugs.openjdk.org/browse/JDK-8325482) by verifying that traces produced using N (in this case 10) distinct seeds are all not identical. >> >> ### Changes to `compiler/debug/TestStress.java` >> While investigating this issue, I observed that in `compiler/debug/TestStress.java`, the stress options for macro expansion and macro elimination were not being triggered because there were fewer than 2 macro nodes. Note that the `shuffle_macro_nodes()` in` compile.cpp` is only meaningful when there are more than two macro nodes. The generated traces for macro expansion and macro elimination in `TestStress.java` were empty. I have proposed changes to address this problem. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments on camelCase test/hotspot/jtreg/compiler/debug/TestStressDistinctSeed.java line 83: > 81: int[] arr1 = new int[n]; > 82: for (int i = 0; i < n; i++) { > 83: synchronized (TestStressDistinctSeed.class) { Was the synchronisation added to create a more "interesting" trace (the tests seem to be running sequentially anyway)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26554#discussion_r2253498424 From snatarajan at openjdk.org Tue Aug 5 08:24:10 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 5 Aug 2025 08:24:10 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v3] In-Reply-To: References: Message-ID: On Tue, 5 Aug 2025 08:07:13 GMT, Damon Fenacci wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> addressing review comments on camelCase > > test/hotspot/jtreg/compiler/debug/TestStressDistinctSeed.java line 83: > >> 81: int[] arr1 = new int[n]; >> 82: for (int i = 0; i < n; i++) { >> 83: synchronized (TestStressDistinctSeed.class) { > > Was the synchronisation added to create a more "interesting" trace (the tests seem to be running sequentially anyway)? Testing the stress options for macro expansion and macro elimination requires at least two macro nodes; otherwise, an empty trace is produced. Synchronisation was added solely to increase the number of macro nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26554#discussion_r2253530430 From dfenacci at openjdk.org Tue Aug 5 08:28:09 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 5 Aug 2025 08:28:09 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v3] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 12:44:21 GMT, Saranya Natarajan wrote: >> The existing test (`compiler/debug/TestStress.java`) verifies that compiler stress options produce consistent traces when using the same seed. However, there is currently no test to ensure that different seeds result in different traces. >> >> ### Solution >> Added a test case to assess the distinctness of traces generated from different seeds. This fix addresses the fragility concern highlighted in [JDK-8325482](https://bugs.openjdk.org/browse/JDK-8325482) by verifying that traces produced using N (in this case 10) distinct seeds are all not identical. >> >> ### Changes to `compiler/debug/TestStress.java` >> While investigating this issue, I observed that in `compiler/debug/TestStress.java`, the stress options for macro expansion and macro elimination were not being triggered because there were fewer than 2 macro nodes. Note that the `shuffle_macro_nodes()` in` compile.cpp` is only meaningful when there are more than two macro nodes. The generated traces for macro expansion and macro elimination in `TestStress.java` were empty. I have proposed changes to address this problem. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments on camelCase Thanks for looking at this and for the clarifications @sarannat. LGTM ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/26554#pullrequestreview-3087141514 From snatarajan at openjdk.org Tue Aug 5 08:36:06 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 5 Aug 2025 08:36:06 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v3] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 12:44:21 GMT, Saranya Natarajan wrote: >> The existing test (`compiler/debug/TestStress.java`) verifies that compiler stress options produce consistent traces when using the same seed. However, there is currently no test to ensure that different seeds result in different traces. >> >> ### Solution >> Added a test case to assess the distinctness of traces generated from different seeds. This fix addresses the fragility concern highlighted in [JDK-8325482](https://bugs.openjdk.org/browse/JDK-8325482) by verifying that traces produced using N (in this case 10) distinct seeds are all not identical. >> >> ### Changes to `compiler/debug/TestStress.java` >> While investigating this issue, I observed that in `compiler/debug/TestStress.java`, the stress options for macro expansion and macro elimination were not being triggered because there were fewer than 2 macro nodes. Note that the `shuffle_macro_nodes()` in` compile.cpp` is only meaningful when there are more than two macro nodes. The generated traces for macro expansion and macro elimination in `TestStress.java` were empty. I have proposed changes to address this problem. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments on camelCase Thank you for the reviews. Please sponsor. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26554#issuecomment-3154097426 From duke at openjdk.org Tue Aug 5 08:36:07 2025 From: duke at openjdk.org (duke) Date: Tue, 5 Aug 2025 08:36:07 GMT Subject: RFR: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags [v3] In-Reply-To: References: Message-ID: <7z_PqEGezGgxVWR9gbS2tW_lwKG4J5nPcpUjXT8E6zI=.33b20829-4dc8-483c-82e9-1d349d02cbb7@github.com> On Mon, 4 Aug 2025 12:44:21 GMT, Saranya Natarajan wrote: >> The existing test (`compiler/debug/TestStress.java`) verifies that compiler stress options produce consistent traces when using the same seed. However, there is currently no test to ensure that different seeds result in different traces. >> >> ### Solution >> Added a test case to assess the distinctness of traces generated from different seeds. This fix addresses the fragility concern highlighted in [JDK-8325482](https://bugs.openjdk.org/browse/JDK-8325482) by verifying that traces produced using N (in this case 10) distinct seeds are all not identical. >> >> ### Changes to `compiler/debug/TestStress.java` >> While investigating this issue, I observed that in `compiler/debug/TestStress.java`, the stress options for macro expansion and macro elimination were not being triggered because there were fewer than 2 macro nodes. Note that the `shuffle_macro_nodes()` in` compile.cpp` is only meaningful when there are more than two macro nodes. The generated traces for macro expansion and macro elimination in `TestStress.java` were empty. I have proposed changes to address this problem. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments on camelCase @sarannat Your change (at version bca4a0ece39b8a75859a9267222a023ed5429720) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26554#issuecomment-3154104355 From snatarajan at openjdk.org Tue Aug 5 08:43:12 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 5 Aug 2025 08:43:12 GMT Subject: Integrated: 8325482: Test that distinct seeds produce distinct traces for compiler stress flags In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 13:38:37 GMT, Saranya Natarajan wrote: > The existing test (`compiler/debug/TestStress.java`) verifies that compiler stress options produce consistent traces when using the same seed. However, there is currently no test to ensure that different seeds result in different traces. > > ### Solution > Added a test case to assess the distinctness of traces generated from different seeds. This fix addresses the fragility concern highlighted in [JDK-8325482](https://bugs.openjdk.org/browse/JDK-8325482) by verifying that traces produced using N (in this case 10) distinct seeds are all not identical. > > ### Changes to `compiler/debug/TestStress.java` > While investigating this issue, I observed that in `compiler/debug/TestStress.java`, the stress options for macro expansion and macro elimination were not being triggered because there were fewer than 2 macro nodes. Note that the `shuffle_macro_nodes()` in` compile.cpp` is only meaningful when there are more than two macro nodes. The generated traces for macro expansion and macro elimination in `TestStress.java` were empty. I have proposed changes to address this problem. This pull request has now been integrated. Changeset: d25b9bef Author: Saranya Natarajan Committer: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/d25b9befe0a462b9785502806ad14e0a5f6b4320 Stats: 142 lines in 2 files changed: 141 ins; 0 del; 1 mod 8325482: Test that distinct seeds produce distinct traces for compiler stress flags Reviewed-by: chagedorn, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/26554 From mhaessig at openjdk.org Tue Aug 5 08:55:06 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 5 Aug 2025 08:55:06 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v2] In-Reply-To: References: <_Ye19u_7PlqlsoRSuR0dNeAGbeuHyN_oqD1ZS4q9Nvk=.b94fd29d-d43e-4561-9926-7f5a46434d8e@github.com> Message-ID: On Tue, 5 Aug 2025 03:56:41 GMT, Dean Long wrote: >> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8308094-timeout >> - Fix SIGALRM test >> - Add timeout functionality to compiler threads > > src/hotspot/share/compiler/compilerThread.cpp line 97: > >> 95: switch (signo) { >> 96: case TIMEOUT_SIGNAL: { >> 97: assert(!Atomic::load_acquire(&_timeout_armed), "compile task timed out"); > > Why do we need acquire? Only the current thread is ever going to be looking at this value, right? The compiler thread setting and unsetting the flag and the signal handler reading the flag are racing each other as soon as the timer is set, since signals are preemptive. This prevents a few false positive timeouts on architectures with weak memory models, but does not have any effect on x86 for example. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2253611427 From mhaessig at openjdk.org Tue Aug 5 08:58:06 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 5 Aug 2025 08:58:06 GMT Subject: RFR: 8360561: PhaseIdealLoop::create_new_if_for_predicate hits "must be a uct if pattern" assert In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 15:51:52 GMT, Quan Anh Mai wrote: >> test/hotspot/jtreg/compiler/igvn/CmpDisjointButNonOrderedRangesLong.java line 42: >> >>> 40: >>> 41: public static void main(String[] strArr) { >>> 42: TestFramework.runWithFlags("-Xcomp", "-XX:CompileCommand=compileonly,compiler.igvn.CmpDisjointButNonOrderedRangesLong::test"); >> >> Supplying `-Xcomp` will skip IR-verification: >> >> https://github.com/openjdk/jdk/blob/fc4755535d61c2fd4d9a2c9a673da148f742f035/test/hotspot/jtreg/compiler/lib/ir_framework/README.md?plain=1#L136-L141 >> >> You can emulate `-Xcomp` behavior with `@Warmup(0)`. > > No, only supplying `Xcomp` to the parent process (the one running the `main`) disables IR verification. You can supply whatever flag to the child process and the IR verification still applies. You can see this in all Valhalla tests. Good to know. Thank you for clearing that up for me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26504#discussion_r2253619901 From mhaessig at openjdk.org Tue Aug 5 09:46:11 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 5 Aug 2025 09:46:11 GMT Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit, limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v3] In-Reply-To: References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com> Message-ID: On Fri, 25 Jul 2025 14:58:47 GMT, Roland Westrelin wrote: >> A node in a pre loop only has uses out of the loop dominated by the >> loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control >> to the loop exit projection. A range check in the main loop has this >> node as input (through a chain of some other nodes). Range check >> elimination needs to update the exit condition of the pre loop with an >> expression that depends on the node pinned on its exit: that's >> impossible and the assert fires. This is a variant of 8314024 (this >> one was for a node with uses out of the pre loop on multiple paths). I >> propose the same fix: leave the node with control in the pre loop in >> this case. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Christian Hagedorn Thank you for working on this, @rwestrel. It looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26424#pullrequestreview-3087455344 From mdoerr at openjdk.org Tue Aug 5 10:16:06 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 5 Aug 2025 10:16:06 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v5] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 21:26:22 GMT, Dean Long wrote: >> This PR removes the recently added lock around set_guard_value, using instead Atomic::cmpxchg to atomically update bit-fields of the guard value. Further, it takes a fast-path that uses the previous direct store when at a safepoint. Combined, these changes should get us back to almost where we were before in terms of overhead. If necessary, we could go even further and allow make_not_entrant() to perform a direct byte store, leaving 24 bits for the guard value. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > one unconditional release should be enough Thanks for implementing nice code for PPC64! I appreciate it! The shared code and the other platforms look fine, too. Maybe atomic bitwise operations could be used, but I'm happy with your current solution. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26399#pullrequestreview-3087610323 From shade at openjdk.org Tue Aug 5 10:25:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 5 Aug 2025 10:25:45 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants [v2] In-Reply-To: References: Message-ID: <-fnYZkHE8e3Xrg1M8DdUdfuHMWh-2YoLyMRKPqKWeZU=.33956235-27c1-4776-99ad-a9245ce55eea@github.com> > See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. > > Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. > > Additional testing: > - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Assert and bailout for ConP -> EncodeP path - Merge branch 'master' into JDK-8361211-c2-encodeable - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26559/files - new: https://git.openjdk.org/jdk/pull/26559/files/a309eb59..d5dd4d8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26559&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26559&range=00-01 Stats: 13253 lines in 302 files changed: 9051 ins; 3517 del; 685 mod Patch: https://git.openjdk.org/jdk/pull/26559.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26559/head:pull/26559 PR: https://git.openjdk.org/jdk/pull/26559 From shade at openjdk.org Tue Aug 5 10:25:46 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 5 Aug 2025 10:25:46 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants [v2] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 22:07:12 GMT, Tobias Hartmann wrote: >> src/hotspot/share/opto/compile.cpp line 3663: >> >>> 3661: n->subsume_by(ConNode::make(t->make_narrowoop()), this); >>> 3662: } else if (t->isa_klassptr()) { >>> 3663: ciKlass* klass = t->is_klassptr()->exact_klass(); >> >> This branch means that we are trying to compress a pointer that cannot be compressed. This seems wrong either way. > > I agree with @merykitty and @TheRealMDoerr. The code is wrong before and after. We should add an assert + compilation bailout in product or a guarantee. All right, I agree with all three of you. New commit adds the assert and bailout when we detect that we are encounter the unencodable class in this `ConP` -> `EncodeP` path. Does that make more sense to you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26559#discussion_r2253903082 From duke at openjdk.org Tue Aug 5 10:27:11 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Tue, 5 Aug 2025 10:27:11 GMT Subject: Integrated: 8364618: Sort share/code includes In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 09:15:54 GMT, Francesco Andreuzzi wrote: > This PR sorts the includes in `hotspot/share/code` using `SortIncludes.java`. I'm also adding the directory to `TestIncludesAreSorted`. > > Passes tier1. This pull request has now been integrated. Changeset: df736eb5 Author: Francesco Andreuzzi Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/df736eb5822de2e2465df04972b1afb90334db5e Stats: 18 lines in 7 files changed: 9 ins; 9 del; 0 mod 8364618: Sort share/code includes Reviewed-by: shade, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/26616 From shade at openjdk.org Tue Aug 5 10:28:06 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 5 Aug 2025 10:28:06 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 22:37:45 GMT, Tobias Hartmann wrote: > I just discussed this with @coleenp and as I understand the plan is to revert [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526) in JDK 26 (and probably backport to JDK 25u). Therefore, I'd say this should be closed. Yes, I think that is the plan. However, I believe it sane and prudent to fix final graph reshaping as well with this PR, as it would fix the path that we _know_ is broken with abstract/interface classes optimizations turned on. We can skip backporting this fix to 25u, if we end up disabling the optimization altogether. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26559#issuecomment-3154609310 From mdoerr at openjdk.org Tue Aug 5 10:32:06 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 5 Aug 2025 10:32:06 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants [v2] In-Reply-To: <-fnYZkHE8e3Xrg1M8DdUdfuHMWh-2YoLyMRKPqKWeZU=.33956235-27c1-4776-99ad-a9245ce55eea@github.com> References: <-fnYZkHE8e3Xrg1M8DdUdfuHMWh-2YoLyMRKPqKWeZU=.33956235-27c1-4776-99ad-a9245ce55eea@github.com> Message-ID: On Tue, 5 Aug 2025 10:25:45 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. >> >> Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Assert and bailout for ConP -> EncodeP path > - Merge branch 'master' into JDK-8361211-c2-encodeable > - Fix Marked as reviewed by mdoerr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26559#pullrequestreview-3087672869 From mhaessig at openjdk.org Tue Aug 5 10:32:07 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 5 Aug 2025 10:32:07 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v2] In-Reply-To: References: <_Ye19u_7PlqlsoRSuR0dNeAGbeuHyN_oqD1ZS4q9Nvk=.b94fd29d-d43e-4561-9926-7f5a46434d8e@github.com> Message-ID: On Tue, 5 Aug 2025 03:55:35 GMT, Dean Long wrote: >> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8308094-timeout >> - Fix SIGALRM test >> - Add timeout functionality to compiler threads > > src/hotspot/share/compiler/compileBroker.cpp line 236: > >> 234: { >> 235: MutexLocker notifier(thread, CompileTaskWait_lock); >> 236: thread->timeout_disarm(); > > Is holding the lock above important for disasming? If not, can we move the disarms from the if/else branches and do it unconditionally before the if? It is not. I'll move it above the `if`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2253919668 From galder at openjdk.org Tue Aug 5 11:20:47 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 5 Aug 2025 11:20:47 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v2] In-Reply-To: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: > I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. > > Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: > > > Benchmark (seed) (size) Mode Cnt Base Patch Units Diff > VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% > VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% > VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% > VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% > VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% > VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% > > > The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. > > I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: - Avoid VectorReinterpret::implemented - Refactor and add copyright header - Rephrase comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26457/files - new: https://git.openjdk.org/jdk/pull/26457/files/b6ec784e..dde8699b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26457&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26457&range=00-01 Stats: 307 lines in 8 files changed: 152 ins; 151 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26457.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26457/head:pull/26457 PR: https://git.openjdk.org/jdk/pull/26457 From galder at openjdk.org Tue Aug 5 11:39:43 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 5 Aug 2025 11:39:43 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: > I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. > > Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: > > > Benchmark (seed) (size) Mode Cnt Base Patch Units Diff > VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% > VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% > VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% > VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% > VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% > VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% > > > The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. > > I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Check at the very least that auto vectorization is supported ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26457/files - new: https://git.openjdk.org/jdk/pull/26457/files/dde8699b..147633f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26457&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26457&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26457.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26457/head:pull/26457 PR: https://git.openjdk.org/jdk/pull/26457 From fbredberg at openjdk.org Tue Aug 5 12:16:08 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 5 Aug 2025 12:16:08 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v2] In-Reply-To: <6GG7DTClrrPBBfGhO0OFwolXrssuVPjxbK8KkJ7uafk=.382a9e23-88b7-4fae-9e5e-00c88d309af8@github.com> References: <6GG7DTClrrPBBfGhO0OFwolXrssuVPjxbK8KkJ7uafk=.382a9e23-88b7-4fae-9e5e-00c88d309af8@github.com> Message-ID: On Tue, 5 Aug 2025 06:25:13 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update one after review > > src/hotspot/cpu/x86/interp_masm_x86.cpp line 1034: > >> 1032: >> 1033: // Load object pointer into obj_reg >> 1034: movptr(obj_reg, Address(lock_reg, BasicObjectLock::obj_offset())); > > Do you not still need the `in_bytes()` around `obj_offset()`? I don't think so. Or at least there are lots of examples that seems to do just fine without `in_bytes()` around `obj_offset()`, like [this line](https://github.com/openjdk/jdk/blob/2a0521863ba7d9df9b4039e61b2ce6932960cd22/src/hotspot/cpu/x86/interp_masm_x86.cpp#L1073). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2254163684 From fbredberg at openjdk.org Tue Aug 5 12:23:04 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 5 Aug 2025 12:23:04 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v2] In-Reply-To: <6GG7DTClrrPBBfGhO0OFwolXrssuVPjxbK8KkJ7uafk=.382a9e23-88b7-4fae-9e5e-00c88d309af8@github.com> References: <6GG7DTClrrPBBfGhO0OFwolXrssuVPjxbK8KkJ7uafk=.382a9e23-88b7-4fae-9e5e-00c88d309af8@github.com> Message-ID: On Tue, 5 Aug 2025 06:14:57 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update one after review > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 392: > >> 390: // Some commentary on balanced locking: >> 391: // >> 392: // fast_lock and fast_unlock are emitted only for provably balanced lock sites. > > I assume this is also correct for `lightweight_lock` and `lightweight_unlock`? I assume the same. But I don't want to change the comments too much, since I plan to do a clean up in which all "lightweight" prefixes will be gone and we will begin talking about it as the normal locking mode. And no, I will not just rename it `normal_lock()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2254178104 From galder at openjdk.org Tue Aug 5 12:32:02 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 5 Aug 2025 12:32:02 GMT Subject: RFR: 8358598: PhaseIterGVN::PhaseIterGVN(PhaseGVN* gvn) doesn't use its parameter In-Reply-To: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> References: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> Message-ID: On Mon, 4 Aug 2025 09:47:23 GMT, Francesco Andreuzzi wrote: > As noted in the ticket, I propose a small cleanup of `PhaseIterGVN` since one of the constructors does not use its parameter. Looks good. What testing did you do? ------------- PR Review: https://git.openjdk.org/jdk/pull/26617#pullrequestreview-3088072511 From fbredberg at openjdk.org Tue Aug 5 12:36:00 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 5 Aug 2025 12:36:00 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v3] In-Reply-To: References: Message-ID: <-ncfIHskHiKnUbJ3nRR8rp678hInGalmZW4CnS5QJp0=.baabffb7-5f4f-4f06-9b23-315f8e9372a7@github.com> > Since the integration of [JDK-8359437](https://bugs.openjdk.org/browse/JDK-8359437) the `LockingMode` flag can no longer be set by the user, instead it's declared as `const int LockingMode = LM_LIGHTWEIGHT;`. This means that we can now safely remove all `LockingMode` related code from all platforms. > > This PR removes `LockingMode` related code from the **x86** platform. > > When all the `LockingMode` code has been removed from all platforms, we can go on and remove it from shared (non-platform specific) files as well. And finally remove the `LockingMode` variable itself. > > Passes tier1-tier5 with no added problems. Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: Update two after review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26552/files - new: https://git.openjdk.org/jdk/pull/26552/files/2a052186..9fa0c947 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26552&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26552&range=01-02 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/26552.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26552/head:pull/26552 PR: https://git.openjdk.org/jdk/pull/26552 From fbredberg at openjdk.org Tue Aug 5 12:36:02 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 5 Aug 2025 12:36:02 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v2] In-Reply-To: <6GG7DTClrrPBBfGhO0OFwolXrssuVPjxbK8KkJ7uafk=.382a9e23-88b7-4fae-9e5e-00c88d309af8@github.com> References: <6GG7DTClrrPBBfGhO0OFwolXrssuVPjxbK8KkJ7uafk=.382a9e23-88b7-4fae-9e5e-00c88d309af8@github.com> Message-ID: <4oH-UY40hMDHJ8JRjizYotMhfO6-zhMnw0onu_JvJo0=.5be9f40b-b384-4e78-a35b-df416d4b74e5@github.com> On Tue, 5 Aug 2025 06:15:21 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update one after review > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 397: > >> 395: // The interpreter provides two properties: >> 396: // I1: At return-time the interpreter automatically and quietly unlocks any >> 397: // objects acquired the current activation (frame). Recall that the > > Suggestion: > > // objects acquired in the current activation (frame). Recall that the Fixed > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 401: > >> 399: // a frame. >> 400: // I2: If a method attempts to unlock an object that is not held by the >> 401: // the frame the interpreter throws IMSX. > > Suggestion: > > // frame the interpreter throws IMSX. Fixed > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 409: > >> 407: // >> 408: // The only other source of unbalanced locking would be JNI. The "Java Native Interface: >> 409: // Programmer's Guide and Specification" claims that an object locked by jni_monitorenter > > Suggestion: > > // The only other source of unbalanced locking would be JNI. The "Java Native Interface > // Specification" states that an object locked by JNI's_MonitorEnter Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2254201947 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2254202571 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2254203510 From duke at openjdk.org Tue Aug 5 12:44:26 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 5 Aug 2025 12:44:26 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v15] In-Reply-To: References: Message-ID: <3q5KpEAfEcG0eJAa2Ip9lXFHnkGUFs1r6PhBIyLQoUI=.261bfa98-93ad-46c8-ac6f-29801adc5a6d@github.com> > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: replaced vmul_vv + vadd_vv by vmadd_vv ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/6c976c0c..da6644b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=13-14 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Tue Aug 5 12:52:01 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Tue, 5 Aug 2025 12:52:01 GMT Subject: RFR: 8358598: PhaseIterGVN::PhaseIterGVN(PhaseGVN* gvn) doesn't use its parameter In-Reply-To: References: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> Message-ID: On Tue, 5 Aug 2025 12:29:14 GMT, Galder Zamarre?o wrote: >> As noted in the ticket, I propose a small cleanup of `PhaseIterGVN` since one of the constructors does not use its parameter. >> >> Passes tier1 and tier2. > > Looks good. What testing did you do? Hi @galderz, I ran tier1 and tier2, forgot to mention that in the PR description. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26617#issuecomment-3155082104 From duke at openjdk.org Tue Aug 5 12:53:24 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 5 Aug 2025 12:53:24 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v16] In-Reply-To: References: Message-ID: <3-IiyzLSiPSYIIYsvzPbMGlvudzupXlbBiG739MC-4E=.d58d0da6-3003-42e8-8012-71bfe84d1cd7@github.com> > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge master - replaced vmul_vv + vadd_vv by vmadd_vv - returned lmul==m4 - fixed error made for prevoius lmul-m1 experiment - make an experiment with lmul==1 instead of lmul==4. - move vredsum_vs out of VEC_LOOP to improve performance - - removed tail processing with RVV instructions as simple scalar loop provides in general better results - simplified arrays_hashcode_v() to be closer to VLA and use less general-purpose registers; minor cosmetic changes - change slli+add sequence to shadd - reorder instructions to make RVV instructions contiguous - ... and 7 more: https://git.openjdk.org/jdk/compare/ba0ae4cb...e7fac6c7 ------------- Changes: https://git.openjdk.org/jdk/pull/17413/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=15 Stats: 443 lines in 6 files changed: 441 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From mhaessig at openjdk.org Tue Aug 5 13:46:06 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 5 Aug 2025 13:46:06 GMT Subject: RFR: 8358598: PhaseIterGVN::PhaseIterGVN(PhaseGVN* gvn) doesn't use its parameter In-Reply-To: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> References: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> Message-ID: On Mon, 4 Aug 2025 09:47:23 GMT, Francesco Andreuzzi wrote: > As noted in the ticket, I propose a small cleanup of `PhaseIterGVN` since one of the constructors does not use its parameter. > > Passes tier1 and tier2. Thank you for working on this cleanup, @fandreuz! The changes look good to me. I only have one nit. I also kicked off testing on our side and will keep you posted on the results. src/hotspot/share/opto/phaseX.cpp line 812: > 810: // Initialize from scratch > 811: PhaseIterGVN::PhaseIterGVN() : _delay_transform(false), > 812: _worklist(*C->igvn_worklist()) Suggestion: PhaseIterGVN::PhaseIterGVN() : _delay_transform(false), _worklist(*C->igvn_worklist()) Nit: align with line above ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26617#pullrequestreview-3088340502 PR Review Comment: https://git.openjdk.org/jdk/pull/26617#discussion_r2254382103 From duke at openjdk.org Tue Aug 5 14:09:46 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Tue, 5 Aug 2025 14:09:46 GMT Subject: RFR: 8358598: PhaseIterGVN::PhaseIterGVN(PhaseGVN* gvn) doesn't use its parameter [v2] In-Reply-To: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> References: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> Message-ID: > As noted in the ticket, I propose a small cleanup of `PhaseIterGVN` since one of the constructors does not use its parameter. > > Passes tier1 and tier2. Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: align with line above Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26617/files - new: https://git.openjdk.org/jdk/pull/26617/files/9ba28dc7..a5553aa0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26617&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26617&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26617/head:pull/26617 PR: https://git.openjdk.org/jdk/pull/26617 From duke at openjdk.org Tue Aug 5 14:49:10 2025 From: duke at openjdk.org (Samuel Chee) Date: Tue, 5 Aug 2025 14:49:10 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v2] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: <9wl7zcDsUD5im2gwdm-jtmLrgDl8oxxj3obx5VtDw90=.a7ca2423-d87f-4db2-9d0d-523a0d58c90f@github.com> On Fri, 25 Jul 2025 08:03:22 GMT, Andrew Haley wrote: >> @theRealAph coincidentally, I have been looking at `MacroAssembler::cmpxchgw` and `MacroAssembler::cmpxchgptr` recently, and it appears their trailing DMBs may also be unnecessary. >> >> I have been unable to find any particular use patterns which relies on the existence of these trailing dmbs, so it does not seem necessary to add the trailingDMB option. Although would like to hear your thoughts on the issue. > >> I have been unable to find any particular use patterns which relies on the existence of these trailing dmbs, so it does not seem necessary to add the trailingDMB option. Although would like to hear your thoughts on the issue. > > Maybe simply move the `dmb` after the non-LSE ldxr/stxr logic, then. My proposal is: 1. For `cmpxchg`, we add a trailingDMB option, and emit if `!useLSE && trailingDMB`, moving the dmbs from outside to inside the method. Have default value for trailingDMB be false so other call sites won't emit this dmb hence won't be affected. 2. In a separate ticket, `cmpxchgptr` and `cmpxchgw` already have DMBs inside their method definitions, so add extra trailingDMB parameter defaulted to true. And emit dmb if true. 3. In a separate ticket, apply same logic to `atomic_##NAME` to move DMB inside function and default trailingDMB to false to not affect other call sites. Does this sound good to you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2254584804 From thartmann at openjdk.org Tue Aug 5 19:21:06 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 5 Aug 2025 19:21:06 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants [v2] In-Reply-To: <-fnYZkHE8e3Xrg1M8DdUdfuHMWh-2YoLyMRKPqKWeZU=.33956235-27c1-4776-99ad-a9245ce55eea@github.com> References: <-fnYZkHE8e3Xrg1M8DdUdfuHMWh-2YoLyMRKPqKWeZU=.33956235-27c1-4776-99ad-a9245ce55eea@github.com> Message-ID: On Tue, 5 Aug 2025 10:25:45 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. >> >> Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Assert and bailout for ConP -> EncodeP path > - Merge branch 'master' into JDK-8361211-c2-encodeable > - Fix Fair enough. Looks good to me! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26559#pullrequestreview-3089417501 From shade at openjdk.org Tue Aug 5 19:21:06 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 5 Aug 2025 19:21:06 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants [v2] In-Reply-To: <-fnYZkHE8e3Xrg1M8DdUdfuHMWh-2YoLyMRKPqKWeZU=.33956235-27c1-4776-99ad-a9245ce55eea@github.com> References: <-fnYZkHE8e3Xrg1M8DdUdfuHMWh-2YoLyMRKPqKWeZU=.33956235-27c1-4776-99ad-a9245ce55eea@github.com> Message-ID: <998Hrokj7nusFkJncr2AxX7o4TXWdZgQ08ElVY09Xec=.e91b7f29-ccea-4abf-b59f-bde78cbc6c84@github.com> On Tue, 5 Aug 2025 10:25:45 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. >> >> Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Assert and bailout for ConP -> EncodeP path > - Merge branch 'master' into JDK-8361211-c2-encodeable > - Fix Linux x86_64 server fastdebug, `make test TEST=all` still passes well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26559#issuecomment-3156128455 From duke at openjdk.org Tue Aug 5 19:38:12 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 5 Aug 2025 19:38:12 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v16] In-Reply-To: <3-IiyzLSiPSYIIYsvzPbMGlvudzupXlbBiG739MC-4E=.d58d0da6-3003-42e8-8012-71bfe84d1cd7@github.com> References: <3-IiyzLSiPSYIIYsvzPbMGlvudzupXlbBiG739MC-4E=.d58d0da6-3003-42e8-8012-71bfe84d1cd7@github.com> Message-ID: On Tue, 5 Aug 2025 12:53:24 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge master > - replaced vmul_vv + vadd_vv by vmadd_vv > - returned lmul==m4 > - fixed error made for prevoius lmul-m1 experiment > - make an experiment with lmul==1 instead of lmul==4. > - move vredsum_vs out of VEC_LOOP to improve performance > - - removed tail processing with RVV instructions as simple scalar loop provides in general better results > - simplified arrays_hashcode_v() to be closer to VLA and use less general-purpose registers; minor cosmetic changes > - change slli+add sequence to shadd > - reorder instructions to make RVV instructions contiguous > - ... and 7 more: https://git.openjdk.org/jdk/compare/ba0ae4cb...e7fac6c7 Updated data after prevoius merge (`e7fac6c`) which includes [JDK-8362596](https://github.com/openjdk/jdk/commit/4189fcbac40943f3b26c3a01938837b4e4762285): bpif3-16g% ( for i in "-XX:DisableIntrinsic=_vectorizedHashCode" "-XX:-UseRVV" "-XX:+UseRVV" ; do ( echo "--- ${i} ---" && jdk/bin/java -jar benchmarks.jar --jvmArgs="-XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions ${i}" org.openjdk.bench.java.lang.ArraysHashCode.ints -p size=1,5,10,20,30,40,50,60,70,80,90,100,200,300 -f 1 -r 1 -w 1 -wi 5 -i 10 2>&1 | tail -15 ) done ) --- -XX:DisableIntrinsic=_vectorizedHashCode --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 10 11.274 ? 0.004 ns/op ArraysHashCode.ints 5 avgt 10 28.837 ? 0.115 ns/op ArraysHashCode.ints 10 avgt 10 43.109 ? 0.091 ns/op ArraysHashCode.ints 20 avgt 10 68.190 ? 0.317 ns/op ArraysHashCode.ints 30 avgt 10 88.075 ? 0.490 ns/op ArraysHashCode.ints 40 avgt 10 115.032 ? 0.230 ns/op ArraysHashCode.ints 50 avgt 10 136.004 ? 0.474 ns/op ArraysHashCode.ints 60 avgt 10 161.900 ? 0.358 ns/op ArraysHashCode.ints 70 avgt 10 169.663 ? 0.419 ns/op ArraysHashCode.ints 80 avgt 10 193.207 ? 0.317 ns/op ArraysHashCode.ints 90 avgt 10 208.696 ? 0.595 ns/op ArraysHashCode.ints 100 avgt 10 232.698 ? 0.291 ns/op ArraysHashCode.ints 200 avgt 10 447.169 ? 0.791 ns/op ArraysHashCode.ints 300 avgt 10 655.249 ? 0.520 ns/op --- -XX:-UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 10 11.273 ? 0.003 ns/op ArraysHashCode.ints 5 avgt 10 23.180 ? 0.008 ns/op ArraysHashCode.ints 10 avgt 10 32.735 ? 0.076 ns/op ArraysHashCode.ints 20 avgt 10 50.745 ? 0.056 ns/op ArraysHashCode.ints 30 avgt 10 71.264 ? 0.148 ns/op ArraysHashCode.ints 40 avgt 10 88.367 ? 0.034 ns/op ArraysHashCode.ints 50 avgt 10 108.355 ? 0.058 ns/op ArraysHashCode.ints 60 avgt 10 125.885 ? 0.055 ns/op ArraysHashCode.ints 70 avgt 10 146.049 ? 0.213 ns/op ArraysHashCode.ints 80 avgt 10 163.479 ? 0.049 ns/op ArraysHashCode.ints 90 avgt 10 183.507 ? 0.170 ns/op ArraysHashCode.ints 100 avgt 10 201.041 ? 0.032 ns/op ArraysHashCode.ints 200 avgt 10 389.416 ? 0.517 ns/op ArraysHashCode.ints 300 avgt 10 576.795 ? 0.364 ns/op --- -XX:+UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 10 11.283 ? 0.005 ns/op ArraysHashCode.ints 5 avgt 10 23.197 ? 0.023 ns/op ArraysHashCode.ints 10 avgt 10 38.824 ? 0.007 ns/op ArraysHashCode.ints 20 avgt 10 70.612 ? 0.372 ns/op ArraysHashCode.ints 30 avgt 10 101.474 ? 0.027 ns/op ArraysHashCode.ints 40 avgt 10 108.357 ? 0.034 ns/op ArraysHashCode.ints 50 avgt 10 139.659 ? 0.061 ns/op ArraysHashCode.ints 60 avgt 10 171.644 ? 0.047 ns/op ArraysHashCode.ints 70 avgt 10 112.136 ? 0.051 ns/op ArraysHashCode.ints 80 avgt 10 146.094 ? 0.289 ns/op ArraysHashCode.ints 90 avgt 10 177.230 ? 0.032 ns/op ArraysHashCode.ints 100 avgt 10 119.787 ? 0.270 ns/op ArraysHashCode.ints 200 avgt 10 161.705 ? 0.086 ns/op ArraysHashCode.ints 300 avgt 10 216.808 ? 0.364 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3156355237 From dlong at openjdk.org Tue Aug 5 21:36:05 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 5 Aug 2025 21:36:05 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v2] In-Reply-To: References: <_Ye19u_7PlqlsoRSuR0dNeAGbeuHyN_oqD1ZS4q9Nvk=.b94fd29d-d43e-4561-9926-7f5a46434d8e@github.com> Message-ID: <1YTjfZh0C7UWqTCXWcNS-Q68kBOnhlRJA_-sS_vZsio=.6389ba30-8f3a-48c2-b0e2-40f5877f4a36@github.com> On Tue, 5 Aug 2025 08:52:20 GMT, Manuel H?ssig wrote: >> src/hotspot/share/compiler/compilerThread.cpp line 97: >> >>> 95: switch (signo) { >>> 96: case TIMEOUT_SIGNAL: { >>> 97: assert(!Atomic::load_acquire(&_timeout_armed), "compile task timed out"); >> >> Why do we need acquire? Only the current thread is ever going to be looking at this value, right? > > The compiler thread setting and unsetting the flag and the signal handler reading the flag are racing each other as soon as the timer is set, since signals are preemptive. This prevents a few false positive timeouts on architectures with weak memory models, but does not have any effect on x86 for example. The signal is delivered to the same compiler thread, so I don't think acquire and release help here. There shouldn't be a weak memory model issue between a thread and it's signal handler on the same thread. Can you explain the sequence of events that could cause a false positive? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2255395595 From kvn at openjdk.org Tue Aug 5 22:59:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Aug 2025 22:59:08 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants [v2] In-Reply-To: <-fnYZkHE8e3Xrg1M8DdUdfuHMWh-2YoLyMRKPqKWeZU=.33956235-27c1-4776-99ad-a9245ce55eea@github.com> References: <-fnYZkHE8e3Xrg1M8DdUdfuHMWh-2YoLyMRKPqKWeZU=.33956235-27c1-4776-99ad-a9245ce55eea@github.com> Message-ID: On Tue, 5 Aug 2025 10:25:45 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. >> >> Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Assert and bailout for ConP -> EncodeP path > - Merge branch 'master' into JDK-8361211-c2-encodeable > - Fix Okay ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26559#pullrequestreview-3089927416 From dlong at openjdk.org Wed Aug 6 00:17:26 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Aug 2025 00:17:26 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v5] In-Reply-To: References: Message-ID: On Tue, 5 Aug 2025 10:13:03 GMT, Martin Doerr wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> one unconditional release should be enough > > Thanks for implementing nice code for PPC64! I appreciate it! The shared code and the other platforms look fine, too. > Maybe atomic bitwise operations could be used, but I'm happy with your current solution. Thanks @TheRealMDoerr . I didn't even consider atomic bitwise operations, but that's a good idea. I'm not in a hurry to push this, so if you could provide an atomic bitwise patch for ppc64, I would be happy to include it. In the mean time, I'm still investigating the ZGC regression. If I can figure it out, I might want to include a fix for ZGC in this PR as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26399#issuecomment-3157006874 From dlong at openjdk.org Wed Aug 6 01:29:58 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Aug 2025 01:29:58 GMT Subject: RFR: 8278874: tighten VerifyStack constraints [v10] In-Reply-To: References: Message-ID: > The VerifyStack logic in Deoptimization::unpack_frames() attempts to check the expression stack size of the interpreter frame against what GenerateOopMap computes. To do this, it needs to know if the state at the current bci represents the "before" state, meaning the bytecode will be reexecuted, or the "after" state, meaning we will advance to the next bytecode. The old code didn't know how to determine exactly what state we were in, so it checked both. This PR cleans that up, so we only have to compute the oopmap once. It also removes old SPARC support. Dean Long has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'openjdk:master' into 8278874-verifystack - Merge branch 'openjdk:master' into 8278874-verifystack - more cleanup - simplify is_top_frame - readability suggestion - reviewer suggestions - Update src/hotspot/share/runtime/vframeArray.cpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/runtime/vframeArray.cpp Co-authored-by: Manuel H?ssig - better name for frame index - Update src/hotspot/share/runtime/deoptimization.cpp Co-authored-by: Manuel H?ssig - ... and 3 more: https://git.openjdk.org/jdk/compare/2689ae8e...e04fc720 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26121/files - new: https://git.openjdk.org/jdk/pull/26121/files/6bfda158..e04fc720 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26121&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26121&range=08-09 Stats: 7351 lines in 213 files changed: 4752 ins; 2089 del; 510 mod Patch: https://git.openjdk.org/jdk/pull/26121.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26121/head:pull/26121 PR: https://git.openjdk.org/jdk/pull/26121 From kbarrett at openjdk.org Wed Aug 6 02:57:16 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 6 Aug 2025 02:57:16 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v5] In-Reply-To: References: Message-ID: On Tue, 5 Aug 2025 10:13:03 GMT, Martin Doerr wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> one unconditional release should be enough > > Thanks for implementing nice code for PPC64! I appreciate it! The shared code and the other platforms look fine, too. > Maybe atomic bitwise operations could be used, but I'm happy with your current solution. > Thanks @TheRealMDoerr . I didn't even consider atomic bitwise operations, but that's a good idea. I'm not in a hurry to push this, so if you could provide an atomic bitwise patch for ppc64, I would be happy to include it. In the mean time, I'm still investigating the ZGC regression. If I can figure it out, I might want to include a fix for ZGC in this PR as well. Not a review, just a drive-by comment. We've had Atomic bitops for a while now. Atomic::fetch_then_{and,or,xor}(ptr, bits [, order]) Atomic::{and,or,xor}_then_fetch(ptr, bits [, order]) They haven't been optimized for most (any?) platforms, being based on cmpxchg. (See all the "Specialize atomic bitset functions for ..." related to https://bugs.openjdk.org/browse/JDK-8293117.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26399#issuecomment-3157237736 From qamai at openjdk.org Wed Aug 6 03:11:12 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Aug 2025 03:11:12 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants [v2] In-Reply-To: <-fnYZkHE8e3Xrg1M8DdUdfuHMWh-2YoLyMRKPqKWeZU=.33956235-27c1-4776-99ad-a9245ce55eea@github.com> References: <-fnYZkHE8e3Xrg1M8DdUdfuHMWh-2YoLyMRKPqKWeZU=.33956235-27c1-4776-99ad-a9245ce55eea@github.com> Message-ID: On Tue, 5 Aug 2025 10:25:45 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. >> >> Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Assert and bailout for ConP -> EncodeP path > - Merge branch 'master' into JDK-8361211-c2-encodeable > - Fix Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26559#pullrequestreview-3090283443 From qamai at openjdk.org Wed Aug 6 03:13:02 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Aug 2025 03:13:02 GMT Subject: RFR: 8360561: PhaseIdealLoop::create_new_if_for_predicate hits "must be a uct if pattern" assert In-Reply-To: References: Message-ID: On Mon, 28 Jul 2025 12:31:49 GMT, Marc Chevalier wrote: > Did you know that ranges can be disjoints and yet not ordered?! Well, in modular arithmetic. > > Let's look at a simplistic example: > > int x; > if (?) { > x = -1; > } else { > x = 1; > } > > if (x != 0) { > return; > } > // Unreachable > > > With signed ranges, before the second `if`, `x` is in `[-1, 1]`. Which is enough to enter to second if, but not enough to prove you have to enter it: it wrongly seems that after the second `if` is still reachable. Twaddle! > > With unsigned ranges, at this point `x` is in `[1, 2^32-1]`, and then, it is clear that `x != 0`. This information is used to refine the value of `x` in the (missing) else-branch, and so, after the if. This is done with simple lattice meet (Hotspot's join): in the else-branch, the possible values of `x` are the meet of what is was worth before, and the interval in the guard, that is `[0, 0]`. Thanks to the unsigned range, this is known to be empty (that is bottom, or Hotspot's top). And with a little reduced product, the whole type of `x` is empty as well. Yet, this information is not used to kill control yet. > > This is here the center of the problem: we have a situation such as: > 2 after-CastII > After node `110 CastII` is idealized, it is found to be Top, and then the uncommon trap at `129` is replaced by `238 Halt` by being value-dead. > 1 before-CastII > Since the control is not killed, the node stay there, eventually making some predicate-related assert fail as a trap is expected under a `ParsePredicate`. > > And that's what this change proposes: when comparing integers with non-ordered ranges, let's see if the unsigned ranges overlap, by computing the meet. If the intersection is empty, then the values can't be equals, without being able to order them. This is new! Without unsigned information for signed integer, either they overlap, or we can order them. Adding modular arithmetic allows to have non-overlapping ranges that are also not ordered. > > Let's also notice that 0 is special: it is important bounds are on each side of 0 (or 2^31, the other discontinuity). For instance if `x` can be 1 or 5, for instance, both the signed and unsigned range will agree on `[1, 5]` and not be able to prove it's, let's say, 3. > > What would there be other ways to treat this problem a bit ... Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26504#pullrequestreview-3090286220 From qamai at openjdk.org Wed Aug 6 03:14:09 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Aug 2025 03:14:09 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: On Tue, 5 Aug 2025 11:39:43 GMT, Galder Zamarre?o wrote: >> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. >> >> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: >> >> >> Benchmark (seed) (size) Mode Cnt Base Patch Units Diff >> VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% >> VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% >> VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% >> VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% >> VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% >> VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% >> >> >> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. >> >> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Check at the very least that auto vectorization is supported Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26457#pullrequestreview-3090286501 From dholmes at openjdk.org Wed Aug 6 05:46:06 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Aug 2025 05:46:06 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v3] In-Reply-To: <-ncfIHskHiKnUbJ3nRR8rp678hInGalmZW4CnS5QJp0=.baabffb7-5f4f-4f06-9b23-315f8e9372a7@github.com> References: <-ncfIHskHiKnUbJ3nRR8rp678hInGalmZW4CnS5QJp0=.baabffb7-5f4f-4f06-9b23-315f8e9372a7@github.com> Message-ID: On Tue, 5 Aug 2025 12:36:00 GMT, Fredrik Bredberg wrote: >> Since the integration of [JDK-8359437](https://bugs.openjdk.org/browse/JDK-8359437) the `LockingMode` flag can no longer be set by the user, instead it's declared as `const int LockingMode = LM_LIGHTWEIGHT;`. This means that we can now safely remove all `LockingMode` related code from all platforms. >> >> This PR removes `LockingMode` related code from the **x86** platform. >> >> When all the `LockingMode` code has been removed from all platforms, we can go on and remove it from shared (non-platform specific) files as well. And finally remove the `LockingMode` variable itself. >> >> Passes tier1-tier5 with no added problems. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update two after review Marked as reviewed by dholmes (Reviewer). src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 409: > 407: // > 408: // The only other source of unbalanced locking would be JNI. The "Java Native Interface > 409: // Specification" states that an object locked by JNI's_MonitorEnter should not be Suggestion: // Specification" states that an object locked by JNI's MonitorEnter should not be Sorry missed the misplaced underscore due to the red-wavy-line spelling error ------------- PR Review: https://git.openjdk.org/jdk/pull/26552#pullrequestreview-3090489187 PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2255920037 From qxing at openjdk.org Wed Aug 6 05:57:47 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 6 Aug 2025 05:57:47 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v6] In-Reply-To: References: Message-ID: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: - Add checks for results of all test methods - Replace `isa_*` with `is_*` and add checks for `Type::BOTTOM` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25928/files - new: https://git.openjdk.org/jdk/pull/25928/files/2f9bca68..da805b03 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=04-05 Stats: 118 lines in 2 files changed: 68 ins; 14 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From qxing at openjdk.org Wed Aug 6 06:12:03 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 6 Aug 2025 06:12:03 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 16:49:02 GMT, Jasmine Karthikeyan wrote: >> When someone passes a non-integer to `CountLeadingZerosINode`, I think. > > Since the function filters `Type::TOP` earlier, I don't think it is possible to see non-int types here. I think it would be better to change it to `is_int()` and remove the null check, so that any broken graph constructions can be caught with the assert on the type check. You might also need to check for `Type::BOTTOM`. @jaskarth Thanks for review, I've updated this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2255956549 From qxing at openjdk.org Wed Aug 6 06:12:07 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 6 Aug 2025 06:12:07 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v5] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 16:50:09 GMT, Jasmine Karthikeyan wrote: >> Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into enhance-clz-type >> - Move `TestCountBitsRange` to `compiler.c2.gvn` >> - Fix null checks >> - Narrow type bound >> - Use `BitsPerX` constant instead of `sizeof` >> - Make the type of count leading/trailing zero nodes more precise > > test/hotspot/jtreg/compiler/c2/gvn/TestCountBitsRange.java line 43: > >> 41: static int i = RunInfo.getRandom().nextInt(); >> 42: static long l = RunInfo.getRandom().nextLong(); >> 43: > > It would be nice to also check the return values of the functions with a non-compiled version, so that we can make sure that the constant folding results are correct as well. Updated, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2255958271 From mhaessig at openjdk.org Wed Aug 6 07:53:05 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 6 Aug 2025 07:53:05 GMT Subject: RFR: 8358598: PhaseIterGVN::PhaseIterGVN(PhaseGVN* gvn) doesn't use its parameter [v2] In-Reply-To: References: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> Message-ID: On Tue, 5 Aug 2025 14:09:46 GMT, Francesco Andreuzzi wrote: >> As noted in the ticket, I propose a small cleanup of `PhaseIterGVN` since one of the constructors does not use its parameter. >> >> Passes tier1 and tier2. > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > align with line above > > Co-authored-by: Manuel H?ssig Testing is all green. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26617#pullrequestreview-3090993897 From qxing at openjdk.org Wed Aug 6 08:24:45 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 6 Aug 2025 08:24:45 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v7] In-Reply-To: References: Message-ID: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Merge branch 'master' into enhance-clz-type - Add checks for results of all test methods - Replace `isa_*` with `is_*` and add checks for `Type::BOTTOM` - Merge branch 'master' into enhance-clz-type - Move `TestCountBitsRange` to `compiler.c2.gvn` - Fix null checks - Narrow type bound - Use `BitsPerX` constant instead of `sizeof` - Make the type of count leading/trailing zero nodes more precise ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25928/files - new: https://git.openjdk.org/jdk/pull/25928/files/da805b03..c9051a0b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=05-06 Stats: 25654 lines in 616 files changed: 13742 ins; 9960 del; 1952 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From dfenacci at openjdk.org Wed Aug 6 08:28:07 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 6 Aug 2025 08:28:07 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v4] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> <_preMnRE0tqL476Pb8bPPfkixInRa-ZH5Qom7W70AW4=.a71e36da-d0e1-44e4-a3fe-9091460b813f@github.com> <51aYnCiXel-vz4Zu40K08E1lyBtX5JXD8PXoCr5wWUE=.15def8e4-f7c3-42ae-976e-f79ed7415bfa@github.com> Message-ID: <2ejqHNlL-UYAWzOUnFmNax-kpwY7G4EPl_aHZVvYslE=.28c7ae7e-f085-49fa-812f-16c8e46fec5f@github.com> On Fri, 25 Jul 2025 09:12:35 GMT, Saranya Natarajan wrote: >> src/hotspot/share/runtime/globals.hpp line 1356: >> >>> 1354: develop(int, BciProfileWidth, 2, \ >>> 1355: "Number of return bci's to record in ret profile") \ >>> 1356: range(0, AARCH64_ONLY(1000) NOT_AARCH64(5000)) \ >> >> I'm not too sure of the usual number of returns but even just 1000 sounds quite big as maximum. Do you think we could use that for all architectures? > > Thank you for the review. I have tested 1000 by reducing the `InterpreterCodeSize` to the smallest value in all the specified architecture in the source code on both AArch64 and x86. It works for 1000. Hence, I think it should work on all architectures. Do you propose I make it 1000 (or a lesser value) for all architecture ? Yes, that was more or less what I was thinking (btw we might want to check with other architectures as well). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26139#discussion_r2256350579 From dfenacci at openjdk.org Wed Aug 6 08:32:08 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 6 Aug 2025 08:32:08 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v5] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: On Fri, 25 Jul 2025 09:01:12 GMT, Saranya Natarajan wrote: >> **Issue** >> Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. >> >> **Analysis** >> On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. >> >> **Proposal** >> Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. >> >> **Issue in AArch64** >> Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comment by adding intx flag Thanks for the changes @sarannat. Do you think we could add a small regression test (or add a couple of simple tests to an existing one)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3158201493 From mhaessig at openjdk.org Wed Aug 6 08:34:05 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 6 Aug 2025 08:34:05 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v2] In-Reply-To: <1YTjfZh0C7UWqTCXWcNS-Q68kBOnhlRJA_-sS_vZsio=.6389ba30-8f3a-48c2-b0e2-40f5877f4a36@github.com> References: <_Ye19u_7PlqlsoRSuR0dNeAGbeuHyN_oqD1ZS4q9Nvk=.b94fd29d-d43e-4561-9926-7f5a46434d8e@github.com> <1YTjfZh0C7UWqTCXWcNS-Q68kBOnhlRJA_-sS_vZsio=.6389ba30-8f3a-48c2-b0e2-40f5877f4a36@github.com> Message-ID: On Tue, 5 Aug 2025 21:33:52 GMT, Dean Long wrote: >> The compiler thread setting and unsetting the flag and the signal handler reading the flag are racing each other as soon as the timer is set, since signals are preemptive. This prevents a few false positive timeouts on architectures with weak memory models, but does not have any effect on x86 for example. > > The signal is delivered to the same compiler thread, so I don't think acquire and release help here. There shouldn't be a weak memory model issue between a thread and it's signal handler on the same thread. Can you explain the sequence of events that could cause a false positive? As far as I understand weak memory models, the concept of threads, which the CPU does not know of, is irrelevant. The CPU is merely free to reorder a subset of memory operations. The reason I added the acquire/release semantics is to prevent the reordering of a write to `_timeout_armed` that happened before a read from the same in the signal handler. This is only relevant when a compilation is done and the timeout is disarmed just before the timer fires. Granted, this is an edge case that does not create a huge amount of false positives. Also, it might be entirely unnecessary due to the OS doing some work to call the signal handler in-between the write and the read of the value. It is merely conservative programming, so I am fine with dropping the barriers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2256370628 From shade at openjdk.org Wed Aug 6 08:35:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 6 Aug 2025 08:35:16 GMT Subject: RFR: 8361211: C2: Final graph reshaping generates unencodeable klass constants [v2] In-Reply-To: <-fnYZkHE8e3Xrg1M8DdUdfuHMWh-2YoLyMRKPqKWeZU=.33956235-27c1-4776-99ad-a9245ce55eea@github.com> References: <-fnYZkHE8e3Xrg1M8DdUdfuHMWh-2YoLyMRKPqKWeZU=.33956235-27c1-4776-99ad-a9245ce55eea@github.com> Message-ID: On Tue, 5 Aug 2025 10:25:45 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. >> >> Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Assert and bailout for ConP -> EncodeP path > - Merge branch 'master' into JDK-8361211-c2-encodeable > - Fix Thank you for reviews! Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26559#issuecomment-3158217080 From shade at openjdk.org Wed Aug 6 08:35:17 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 6 Aug 2025 08:35:17 GMT Subject: Integrated: 8361211: C2: Final graph reshaping generates unencodeable klass constants In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 16:20:43 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. I have tried to come up with an isolated test, but failed. So I am doing this change somewhat blindly, without a clear regression test. The investigation on the CTW points directly to this code, and I believe we should be more conservative in final graph reshaping. [JDK-8343206](https://bugs.openjdk.org/browse/JDK-8343206) added the assert for `ConNKlass`, which somehow does not trigger. I think it is safe to bail out of this transformation. > > Also, this only plugs this particular leak. I think we should really be disabling the abstract/interface encoding optimization until C2 does not expose itself to this issue on more paths. There is [JDK-8343218](https://bugs.openjdk.org/browse/JDK-8343218) that we can re-open. > > Additional testing: > - [x] Linux x86_64 server fastdebug, a rare CTW failure does not reproduce anymore > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` This pull request has now been integrated. Changeset: e304d379 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/e304d37996b075b8b2b44b5762d7d242169add49 Stats: 11 lines in 1 file changed: 9 ins; 0 del; 2 mod 8361211: C2: Final graph reshaping generates unencodeable klass constants Reviewed-by: kvn, qamai, thartmann, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/26559 From xgong at openjdk.org Wed Aug 6 09:42:08 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 6 Aug 2025 09:42:08 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation In-Reply-To: References: <38f2bvFqiVNQGGpMif0iflVFD8wXnyw4SwtKxwi_Dmo=.276fb2fb-b80c-4ea7-a32f-c326294f442a@github.com> <1xAfD3mz5cbQpYtCYxoHqRQcOLadLKNHrvMUtFtFbGo=.34e5780a-e37a-427c-b745-1ed422c7a008@github.com> <4tejg5hp-eHBmAEvKbpTg_mv_TUYU5kg0HIccmWyac8=.3638758e-5000-4d1f-924f-abb4a21952c6@github.com> Message-ID: On Thu, 17 Jul 2025 11:28:18 GMT, Fei Gao wrote: >>> I like this idea! The first one looks better, in which `concate` would provide lower-level and more fine-grained semantics, allowing us to define fewer IR node types while supporting more scenarios. >> >> Yes, I agree with you. I'm now working on refactoring the IR based on the first idea. I will update the patch as soon as possible. Thanks for your valuable suggestion! > >> >> Yes, I agree with you. I'm now working on refactoring the IR based on the first idea. I will update the patch as soon as possible. Thanks for your valuable suggestion! > > Thanks! I?d suggest also highlighting `aarch64` in the JBS title, so others who are interested won?t miss it. Hi @fg1417 , I'v addressed your comments in latest commit. Would you mind taking another look? Thanks~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3158708886 From galder at openjdk.org Wed Aug 6 09:45:17 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 6 Aug 2025 09:45:17 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option [v2] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 10:59:57 GMT, Zdenek Zambersky wrote: >>> (I have not changed JIRA as there is no info about fix. Should I add it there?) >> >> Yes please, that is generally what we should do :) > > @eme64 thank you for the review Possibly, I've pinged @zzambers ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-3158748048 From chagedorn at openjdk.org Wed Aug 6 09:49:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Aug 2025 09:49:06 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: <0PSLJlyGUw_lGVqliV6ABJrwoNa9TNcwVuv6flDyAV8=.1f0993be-f97f-4aea-b619-81a096076c3e@github.com> On Tue, 5 Aug 2025 11:39:43 GMT, Galder Zamarre?o wrote: >> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. >> >> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: >> >> >> Benchmark (seed) (size) Mode Cnt Base Patch Units Diff >> VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% >> VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% >> VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% >> VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% >> VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% >> VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% >> >> >> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. >> >> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Check at the very least that auto vectorization is supported Internal testing of the latest commit looked good (not a review). ------------- PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3158174352 From galder at openjdk.org Wed Aug 6 09:49:08 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 6 Aug 2025 09:49:08 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> <7UqSdBPWH0SbdkhAUvF_qM10rK0oFsJXhUKWA3VlL14=.0c35e297-7276-468b-98c6-046e84897625@github.com> Message-ID: On Tue, 5 Aug 2025 06:32:23 GMT, Quan Anh Mai wrote: >>> VectorNode::is_reinterpret_opcode returns true for Op_ReinterpretHF2S and Op_ReinterpretS2HF, which are very similar to the nodes in this PR, can you add these nodes to that method instead? >> >> You're suggesting to modify `is_reinterpret_opcode` to be like this, and call that instead of `is_move_opcode`, right? >> >> >> bool VectorNode::is_reinterpret_opcode(int opc) { >> switch (opc) { >> case Op_ReinterpretHF2S: >> case Op_ReinterpretS2HF: >> case Op_MoveF2I: >> case Op_MoveD2L: >> case Op_MoveL2D: >> case Op_MoveI2F: >> return true; >> default: >> return false; >> } >> } > >> You're suggesting to modify `is_reinterpret_opcode` to be like this, and call that instead of `is_move_opcode`, right? > > Yes, that's right. I believe `VectorReinterpret` should be implemented for all pairs of vector species where both the input and output species are implemented. So, `VectorReinterpretNode::implemented` is unnecessary. @merykitty thanks for the approval. I've run tier1-3 tests for 147633f and they all passed, and the benchmark results are the same as in the description. Thanks @chhagedorn for running the tests! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3158785367 From galder at openjdk.org Wed Aug 6 09:53:06 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 6 Aug 2025 09:53:06 GMT Subject: RFR: 8358598: PhaseIterGVN::PhaseIterGVN(PhaseGVN* gvn) doesn't use its parameter [v2] In-Reply-To: References: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> Message-ID: On Tue, 5 Aug 2025 14:09:46 GMT, Francesco Andreuzzi wrote: >> As noted in the ticket, I propose a small cleanup of `PhaseIterGVN` since one of the constructors does not use its parameter. >> >> Passes tier1 and tier2. > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > align with line above > > Co-authored-by: Manuel H?ssig Marked as reviewed by galder (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/26617#pullrequestreview-3091774224 From jsjolen at openjdk.org Wed Aug 6 10:00:04 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 6 Aug 2025 10:00:04 GMT Subject: RFR: 8352067: Remove the NMT treap and replace its uses with the utilities red-black tree In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 09:29:05 GMT, Casper Norrbin wrote: > Hi everyone, > > The utilities red-black tree and the NMT treap serve similar functions. Given the red-black tree's versatility and stricter time complexity, the treap can be removed in favour of it. > > I made some modifications to the red-black tree to make it compatible with previous treap usages: > - Updated the `visit_in_order` and `visit_range_in_order` functions to require the supplied callback to return a bool, which allows us to stop traversing early. > - Improved const-correctness by ensuring that invoking these functions on a const reference provides const pointers to nodes, while non-const references provide mutable pointers. Previously the two functions behaved differently. > > Changes to NMT include: > - Modified components to align with the updated const-correctness of the red-black tree functions > - Renamed structures and variables to remove "treap" from their names to reflect the new tree > > The treap was also used in one place in C2. I changed this to use the red-black tree and its cursor interface, which I felt was most fitting for the use case. This LGTM. What testing did you run? src/hotspot/share/nmt/vmatree.hpp line 196: > 194: > 195: public: > 196: using VMARBTree = RBTreeCHeap; You should be abl to just write 'mtNMT', same for all other `MemTag::` prefixes. src/hotspot/share/opto/printinlining.cpp line 90: > 88: > 89: return node->val(); > 90: } Could you expand the `auto`s? It should be a code action in VSCode if you use clangd. We tend to only use auto when it's a lambda. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26655#pullrequestreview-3091786865 PR Review Comment: https://git.openjdk.org/jdk/pull/26655#discussion_r2256636655 PR Review Comment: https://git.openjdk.org/jdk/pull/26655#discussion_r2256634808 From duke at openjdk.org Wed Aug 6 10:13:46 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 10:13:46 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v17] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: make 'result' calculations scalar; clear vector registers only when necessary. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/e7fac6c7..bbcd1ec4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=15-16 Stats: 15 lines in 1 file changed: 3 ins; 7 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Wed Aug 6 10:13:51 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 10:13:51 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v16] In-Reply-To: <3-IiyzLSiPSYIIYsvzPbMGlvudzupXlbBiG739MC-4E=.d58d0da6-3003-42e8-8012-71bfe84d1cd7@github.com> References: <3-IiyzLSiPSYIIYsvzPbMGlvudzupXlbBiG739MC-4E=.d58d0da6-3003-42e8-8012-71bfe84d1cd7@github.com> Message-ID: On Tue, 5 Aug 2025 12:53:24 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge master > - replaced vmul_vv + vadd_vv by vmadd_vv > - returned lmul==m4 > - fixed error made for prevoius lmul-m1 experiment > - make an experiment with lmul==1 instead of lmul==4. > - move vredsum_vs out of VEC_LOOP to improve performance > - - removed tail processing with RVV instructions as simple scalar loop provides in general better results > - simplified arrays_hashcode_v() to be closer to VLA and use less general-purpose registers; minor cosmetic changes > - change slli+add sequence to shadd > - reorder instructions to make RVV instructions contiguous > - ... and 7 more: https://git.openjdk.org/jdk/compare/ba0ae4cb...e7fac6c7 `bbcd1ec`: --- -XX:DisableIntrinsic=_vectorizedHashCode --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 30 11.293 ? 0.019 ns/op ArraysHashCode.ints 5 avgt 30 28.861 ? 0.052 ns/op ArraysHashCode.ints 10 avgt 30 41.105 ? 0.170 ns/op ArraysHashCode.ints 20 avgt 30 68.275 ? 0.028 ns/op ArraysHashCode.ints 30 avgt 30 89.379 ? 0.659 ns/op ArraysHashCode.ints 40 avgt 30 115.111 ? 0.157 ns/op ArraysHashCode.ints 50 avgt 30 136.252 ? 0.522 ns/op ArraysHashCode.ints 60 avgt 30 161.913 ? 0.276 ns/op ArraysHashCode.ints 70 avgt 30 170.858 ? 0.367 ns/op ArraysHashCode.ints 80 avgt 30 195.566 ? 0.288 ns/op ArraysHashCode.ints 90 avgt 30 208.160 ? 0.527 ns/op ArraysHashCode.ints 100 avgt 30 232.718 ? 0.688 ns/op ArraysHashCode.ints 200 avgt 30 448.118 ? 1.079 ns/op ArraysHashCode.ints 300 avgt 30 656.164 ? 0.647 ns/op ArraysHashCode.ints 1000 avgt 30 2139.643 ? 0.729 ns/op ArraysHashCode.ints 10000 avgt 30 23584.704 ? 40.276 ns/op --- -XX:-UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 30 11.282 ? 0.010 ns/op ArraysHashCode.ints 5 avgt 30 23.216 ? 0.042 ns/op ArraysHashCode.ints 10 avgt 30 33.251 ? 0.055 ns/op ArraysHashCode.ints 20 avgt 30 50.740 ? 0.010 ns/op ArraysHashCode.ints 30 avgt 30 70.871 ? 0.100 ns/op ArraysHashCode.ints 40 avgt 30 88.330 ? 0.026 ns/op ArraysHashCode.ints 50 avgt 30 108.937 ? 0.097 ns/op ArraysHashCode.ints 60 avgt 30 125.877 ? 0.024 ns/op ArraysHashCode.ints 70 avgt 30 146.213 ? 0.166 ns/op ArraysHashCode.ints 80 avgt 30 163.545 ? 0.072 ns/op ArraysHashCode.ints 90 avgt 30 183.957 ? 0.371 ns/op ArraysHashCode.ints 100 avgt 30 201.147 ? 0.322 ns/op ArraysHashCode.ints 200 avgt 30 389.061 ? 0.168 ns/op ArraysHashCode.ints 300 avgt 30 576.751 ? 0.091 ns/op ArraysHashCode.ints 1000 avgt 30 1994.798 ? 116.508 ns/op ArraysHashCode.ints 10000 avgt 30 20482.232 ? 40.742 ns/op --- -XX:+UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 30 11.291 ? 0.019 ns/op ArraysHashCode.ints 5 avgt 30 23.245 ? 0.104 ns/op ArraysHashCode.ints 10 avgt 30 38.872 ? 0.061 ns/op ArraysHashCode.ints 20 avgt 30 70.267 ? 0.127 ns/op ArraysHashCode.ints 30 avgt 30 102.054 ? 0.478 ns/op ArraysHashCode.ints 40 avgt 30 71.729 ? 0.890 ns/op ArraysHashCode.ints 50 avgt 30 105.017 ? 0.726 ns/op ArraysHashCode.ints 60 avgt 30 136.776 ? 0.407 ns/op ArraysHashCode.ints 70 avgt 30 77.465 ? 0.204 ns/op ArraysHashCode.ints 80 avgt 30 106.172 ? 1.303 ns/op ArraysHashCode.ints 90 avgt 30 137.791 ? 1.098 ns/op ArraysHashCode.ints 100 avgt 30 74.734 ? 0.115 ns/op ArraysHashCode.ints 200 avgt 30 116.477 ? 0.015 ns/op ArraysHashCode.ints 300 avgt 30 161.045 ? 1.155 ns/op ArraysHashCode.ints 1000 avgt 30 336.027 ? 0.953 ns/op ArraysHashCode.ints 10000 avgt 30 5427.369 ? 37.820 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3159077967 From jbhateja at openjdk.org Wed Aug 6 10:46:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Aug 2025 10:46:07 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v7] In-Reply-To: References: Message-ID: <_tJxPUHoJrERm4Yxb3ituFLhPv9PKVQ3K6ufXQWCk2Y=.de11d41b-f2d6-4c3b-88a9-e79013815aeb@github.com> On Wed, 6 Aug 2025 08:24:45 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Merge branch 'master' into enhance-clz-type > - Add checks for results of all test methods > - Replace `isa_*` with `is_*` and add checks for `Type::BOTTOM` > - Merge branch 'master' into enhance-clz-type > - Move `TestCountBitsRange` to `compiler.c2.gvn` > - Fix null checks > - Narrow type bound > - Use `BitsPerX` constant instead of `sizeof` > - Make the type of count leading/trailing zero nodes more precise src/hotspot/share/opto/countbitsnode.cpp line 89: > 87: return TypeInt::make(count_leading_zeros_long(~tl->_bits._zeros), > 88: count_leading_zeros_long(tl->_bits._ones), > 89: tl->_widen); Hi @MaxXSoft, Taking the librty to add some comments related to the proof of the above assumptions, hope you won't mind :-) **Z3 proof for [KnownBits.ONES, ~KnownBits.ZEROS] >= [LB, UB] where >= is a superset relation.** from z3 import * from functools import reduce # 64-bit symbolic unsigned integers UB = BitVec('UB', 64) LB = BitVec('LB', 64) # XOR for detecting differing bits xor_val = UB ^ LB # COUNT_LEADING_ZEROS def count_leading_zeros(x): """Returns number of leading zeros in a 64-bit word.""" clauses = [] for i in range(64): bit_is_one = LShR(x, 63 - i) & 1 == 1 bits_before_zero = And([LShR(x, 63 - j) & 1 == 0 for j in range(i)]) clauses.append(If(And(bits_before_zero, bit_is_one), BitVecVal(i, 64), BitVecVal(64, 64))) return reduce(lambda a, b: If(a == 64, b, a), clauses) # Step 1: Compute common prefix length CPL = count_leading_zeros(xor_val) # Step 2: COMMON_PREFIX_MASK = ((1 << CPL) - 1) << (64 - CPL) one_shifted = (BitVecVal(1, 64) << CPL) mask = one_shifted - 1 COMMON_PREFIX_MASK = mask << (64 - CPL) # Step 3: COMMON_PREFIX COMMON_PREFIX = UB & COMMON_PREFIX_MASK # Step 4: ZEROS and ONES ZEROS = COMMON_PREFIX_MASK & (~COMMON_PREFIX) ONES = COMMON_PREFIX # Step 5: Prove that [ONES, ~ZEROS] ? [LB, UB] prop = And(ULE(ONES, LB), ULE(UB, ~ZEROS)) # Step 6: Try to disprove (i.e., check if any UB, LB violates the above) s = Solver() s.add(Not(prop)) # Look for counterexamples # Check the result if s.check() == sat: m = s.model() ub_val = m.eval(UB).as_long() lb_val = m.eval(LB).as_long() ones_val = m.eval(ONES).as_long() zeros_val = m.eval(ZEROS).as_long() not_zeros_val = (~zeros_val) & 0xFFFFFFFFFFFFFFFF print("? Property does NOT hold. Counterexample found:") print(f"LB = {lb_val:#018x}") print(f"UB = {ub_val:#018x}") print(f"ONES = {ones_val:#018x}") print(f"~ZEROS = {not_zeros_val:#018x}") print(f"ONES <= LB? {ones_val <= lb_val}") print(f"UB <= ~ZEROS? {ub_val <= not_zeros_val}") else: print("? Property holds: [ONES, ~ZEROS] always covers [LB, UB] (UNSAT)") image Manually worked out Example 1:- UB = 0xFFFF00FF LB = 0xFFFF0000 COMMON_PREFIX = 0xFFFF0000 ONES = 0xFFFF0000 ZEROS = 0x00000000 --------------------------- MAX = ~ZEROS = 0xFFFFFFFF MIN = ONES = 0xFFFF0000 **Thus, it's evident that MAX >= UB AND MIN <= LB** Manually worked out Example 2:- UB = 0xFF0F00FF LB = 0xFF0F0000 COMMON_PREFIX = 0xFF0F0000 ONES = 0xFF0F0000 ZEROS = 0x00F00000 Since logical OR b/w ONES and ZEROS forms the COMMON_PREFIX MASK i.e. both ZEROS and ONES are derived from COMMON_PREFIX therefore COMMON_PREFIX_MASK & ~ZERO == COMMON_PREFIX_MASK & ONES, apart from COMMON_PREFIX there are other lower order bits which are not included in COMMON_PREFIX, when we flip the ZEROS then we get a VALUE which equals ONES + we set all other lower order bits, thus this value is guaranteed to be greater than UB since it also set other non-set bits of UB. Thus, UB = COMMON_PREFIX + OTHER_SET_LOWER_ORDER_BITS_IN_UB ~ZEROS = COMMON_PREGIX + REMAINING_BITS_SET_TO_ONE. While ONES is just the common prefix, which is guaranteed to be less than UB since we don?t account for set bits of UB, which are not part of the common prefix. Thus, the following inequality holds good. **~ZEROS >= UB >= LB >= ONES** **Proof for the KnownBits application to CTZ / CLZ** - In both these cases, we are only concerned about the number of ZEROs which are either present at the start or towards the end of the integral bits sequence. - KnownBits.ZEROS and KnownBits.ONES captures the known set bits in value range, i.e., if a bit is set on ONES it means it's guaranteed to be one in all the values in the value range; similarly, a set bit in ZEROS signifies that the corresponding bit remains unset in all the values in the range. All unset bits in ONES/ZEROS signify not known at compile time. Since the flipped value of ZEROS gives us the maximum value in the range, and given that number of leading zeros of maximum value will never be less than the number of leading zeros of minimum value of the range hence, value range of CLZ varies b/w CLZ(~ZEROS) and CLZ(ONES) as its lower and upper bounds. While counting leading zeros is relatively straightforward to prove given that each integral value is the sum of 2^position of set bits and given that TypeInt/TypeLong comply with the invariant (LB <= UB) Hence highest set bits position of UB can never be less than the LB. For CTZ, consider the following example UB = 0xFF800000 LB = 0xFF8FF0F0 COMMON_PREFIX = 0xFF800000 ONES = 0xFF800000 ZEROS = 0x00700000 Again, both ONES and ZEROS bits are extracted from the common prefix of the delimiting bounds of the value range. MAX = ~ZEROS = 0xFF8FFFFF MIN = ONES = 0xFF800000 Thus, the number of trailing zeros will be [lb:0, ub:20], given that the lower order bits of knowbits which are not part of the common prefix will always be set to zero, hence CTZ(~ZERO) will always be 0. This is also in line with the fact that the number of leading/trailing zeros will never be a non-negative value. Therefore, delimiting lower and upper bounds of CTZ are CTZ(~ZEROS) and CTZ(ONES). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2256750507 From cnorrbin at openjdk.org Wed Aug 6 11:07:42 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 6 Aug 2025 11:07:42 GMT Subject: RFR: 8352067: Remove the NMT treap and replace its uses with the utilities red-black tree [v2] In-Reply-To: References: Message-ID: > Hi everyone, > > The utilities red-black tree and the NMT treap serve similar functions. Given the red-black tree's versatility and stricter time complexity, the treap can be removed in favour of it. > > I made some modifications to the red-black tree to make it compatible with previous treap usages: > - Updated the `visit_in_order` and `visit_range_in_order` functions to require the supplied callback to return a bool, which allows us to stop traversing early. > - Improved const-correctness by ensuring that invoking these functions on a const reference provides const pointers to nodes, while non-const references provide mutable pointers. Previously the two functions behaved differently. > > Changes to NMT include: > - Modified components to align with the updated const-correctness of the red-black tree functions > - Renamed structures and variables to remove "treap" from their names to reflect the new tree > > The treap was also used in one place in C2. I changed this to use the red-black tree and its cursor interface, which I felt was most fitting for the use case. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: feedback fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26655/files - new: https://git.openjdk.org/jdk/pull/26655/files/97107f5b..a495e291 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26655&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26655&range=00-01 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26655.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26655/head:pull/26655 PR: https://git.openjdk.org/jdk/pull/26655 From dholmes at openjdk.org Wed Aug 6 12:01:04 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Aug 2025 12:01:04 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v2] In-Reply-To: References: <_Ye19u_7PlqlsoRSuR0dNeAGbeuHyN_oqD1ZS4q9Nvk=.b94fd29d-d43e-4561-9926-7f5a46434d8e@github.com> <1YTjfZh0C7UWqTCXWcNS-Q68kBOnhlRJA_-sS_vZsio=.6389ba30-8f3a-48c2-b0e2-40f5877f4a36@github.com> Message-ID: On Wed, 6 Aug 2025 08:31:26 GMT, Manuel H?ssig wrote: >> The signal is delivered to the same compiler thread, so I don't think acquire and release help here. There shouldn't be a weak memory model issue between a thread and it's signal handler on the same thread. Can you explain the sequence of events that could cause a false positive? > > As far as I understand weak memory models, the concept of threads, which the CPU does not know of, is irrelevant. The CPU is merely free to reorder a subset of memory operations. The reason I added the acquire/release semantics is to prevent the reordering of a write to `_timeout_armed` that happened before a read from the same in the signal handler. This is only relevant when a compilation is done and the timeout is disarmed just before the timer fires. > > Granted, this is an edge case that does not create a huge amount of false positives. Also, it might be entirely unnecessary due to the OS doing some work to call the signal handler in-between the write and the read of the value. It is merely conservative programming, so I am fine with dropping the barriers. That is not an accurate characterisation IMO - a memory model defines the consistency of the views of memory by different threads of execution. Any given thread of execution must always have a self-consistent view of memory, and neither the compiler or the CPU is allowed to violate that if code is actually going to be guaranteed to work correctly. If the signal is being handled in the same thread that is setting the flag then you cannot have a memory ordering issue. And acquire/release is not the right tool just to establish an ordering: you would generally want a storestore barrier on the writer side, and a loadload barrier on the reader side. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2256920642 From mhaessig at openjdk.org Wed Aug 6 12:26:04 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 6 Aug 2025 12:26:04 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v2] In-Reply-To: References: <_Ye19u_7PlqlsoRSuR0dNeAGbeuHyN_oqD1ZS4q9Nvk=.b94fd29d-d43e-4561-9926-7f5a46434d8e@github.com> <1YTjfZh0C7UWqTCXWcNS-Q68kBOnhlRJA_-sS_vZsio=.6389ba30-8f3a-48c2-b0e2-40f5877f4a36@github.com> Message-ID: On Wed, 6 Aug 2025 11:58:22 GMT, David Holmes wrote: >> As far as I understand weak memory models, the concept of threads, which the CPU does not know of, is irrelevant. The CPU is merely free to reorder a subset of memory operations. The reason I added the acquire/release semantics is to prevent the reordering of a write to `_timeout_armed` that happened before a read from the same in the signal handler. This is only relevant when a compilation is done and the timeout is disarmed just before the timer fires. >> >> Granted, this is an edge case that does not create a huge amount of false positives. Also, it might be entirely unnecessary due to the OS doing some work to call the signal handler in-between the write and the read of the value. It is merely conservative programming, so I am fine with dropping the barriers. > > That is not an accurate characterisation IMO - a memory model defines the consistency of the views of memory by different threads of execution. Any given thread of execution must always have a self-consistent view of memory, and neither the compiler or the CPU is allowed to violate that if code is actually going to be guaranteed to work correctly. If the signal is being handled in the same thread that is setting the flag then you cannot have a memory ordering issue. > > And acquire/release is not the right tool just to establish an ordering: you would generally want a storestore barrier on the writer side, and a loadload barrier on the reader side. Ah, I see my mistake. 1) I did not think about the C++ memory model, and 2) I thought of the signal handler as a different thread, which made me a bit paranoid. Thank you both for pointing out the mistakes in my thinking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2256979751 From duke at openjdk.org Wed Aug 6 12:31:26 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 12:31:26 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v18] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: make powmax calculations scalar; re-use v_tmp for sum reduction operation. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/bbcd1ec4..7c5f24aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=16-17 Stats: 7 lines in 1 file changed: 1 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Wed Aug 6 12:31:26 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 12:31:26 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v17] In-Reply-To: References: Message-ID: <-HMb5x-omKstNHudAYz9_fZlDfXCCYOiBYyc27yp4PI=.7e47fd21-5124-45e3-a4c4-4433b412c587@github.com> On Wed, 6 Aug 2025 10:13:46 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > make 'result' calculations scalar; clear vector registers only when necessary. `7c5f24a`: --- -XX:+UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 30 11.277 ? 0.006 ns/op ArraysHashCode.ints 5 avgt 30 23.300 ? 0.147 ns/op ArraysHashCode.ints 10 avgt 30 38.838 ? 0.017 ns/op ArraysHashCode.ints 20 avgt 30 70.163 ? 0.025 ns/op ArraysHashCode.ints 30 avgt 30 101.495 ? 0.082 ns/op ArraysHashCode.ints 40 avgt 30 69.880 ? 0.390 ns/op ArraysHashCode.ints 50 avgt 30 103.447 ? 0.519 ns/op ArraysHashCode.ints 60 avgt 30 134.626 ? 0.302 ns/op ArraysHashCode.ints 70 avgt 30 74.650 ? 0.142 ns/op ArraysHashCode.ints 80 avgt 30 104.745 ? 0.458 ns/op ArraysHashCode.ints 90 avgt 30 136.548 ? 0.090 ns/op ArraysHashCode.ints 100 avgt 30 74.526 ? 0.010 ns/op ArraysHashCode.ints 200 avgt 30 115.240 ? 1.812 ns/op ArraysHashCode.ints 300 avgt 30 157.611 ? 2.621 ns/op ArraysHashCode.ints 1000 avgt 30 343.443 ? 9.225 ns/op ArraysHashCode.ints 10000 avgt 30 5426.387 ? 30.774 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3159965574 From qamai at openjdk.org Wed Aug 6 12:42:12 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Aug 2025 12:42:12 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v7] In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 08:24:45 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Merge branch 'master' into enhance-clz-type > - Add checks for results of all test methods > - Replace `isa_*` with `is_*` and add checks for `Type::BOTTOM` > - Merge branch 'master' into enhance-clz-type > - Move `TestCountBitsRange` to `compiler.c2.gvn` > - Fix null checks > - Narrow type bound > - Use `BitsPerX` constant instead of `sizeof` > - Make the type of count leading/trailing zero nodes more precise src/hotspot/share/opto/countbitsnode.cpp line 50: > 48: //------------------------------Value------------------------------------------ > 49: const Type* CountLeadingZerosINode::Value(PhaseGVN* phase) const { > 50: // If the input is TOP, the result is also TOP. IMO this comment is unnecessary, it is trivial to infer from the code below. src/hotspot/share/opto/countbitsnode.cpp line 57: > 55: > 56: // If the input is BOTTOM, the result is the local BOTTOM. > 57: if (t == Type::BOTTOM) { `t` should not be `Type::BOTTOM`, please remove this case. src/hotspot/share/opto/countbitsnode.cpp line 62: > 60: > 61: const TypeInt* ti = t->is_int(); > 62: if (ti->is_con()) { This is unnecessary, if `ti` is a constant then `~ti->_bits._zeros == ti->_bits._ones` and the below case will return a constant anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2257016904 PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2257017845 PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2257019909 From mhaessig at openjdk.org Wed Aug 6 13:19:33 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 6 Aug 2025 13:19:33 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v3] In-Reply-To: References: Message-ID: > This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. > > The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. > > Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. > > Testing: > - [ ] Github Actions > - [ ] tier1, tier2 on all platforms > - [ ] tier3, tier4 and Oracle internal testing on Linux fastdebug > - [ ] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into JDK-8308094-timeout - No acquire release semantics - Factor Linux specific timeout functionality out of share/ - Move timeout disarm above if - Merge branch 'master' into JDK-8308094-timeout - Fix SIGALRM test - Add timeout functionality to compiler threads ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26023/files - new: https://git.openjdk.org/jdk/pull/26023/files/5840cc2e..d50231f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=01-02 Stats: 67038 lines in 1707 files changed: 39117 ins; 18664 del; 9257 mod Patch: https://git.openjdk.org/jdk/pull/26023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26023/head:pull/26023 PR: https://git.openjdk.org/jdk/pull/26023 From mhaessig at openjdk.org Wed Aug 6 13:19:34 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 6 Aug 2025 13:19:34 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v2] In-Reply-To: References: <_Ye19u_7PlqlsoRSuR0dNeAGbeuHyN_oqD1ZS4q9Nvk=.b94fd29d-d43e-4561-9926-7f5a46434d8e@github.com> Message-ID: On Tue, 5 Aug 2025 04:04:01 GMT, Dean Long wrote: >> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8308094-timeout >> - Fix SIGALRM test >> - Add timeout functionality to compiler threads > > This looks correct, but would it be possible to move the Linux-specific code out of src/hotspot/share? Thank you for reviewing this PR, @dean-long. For v2 I factored the Linux-specific code out of hotspot/share, moved the disarm above the if, and removed the acquire/release. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26023#issuecomment-3160140787 From aph at openjdk.org Wed Aug 6 14:10:10 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 6 Aug 2025 14:10:10 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v2] In-Reply-To: <9wl7zcDsUD5im2gwdm-jtmLrgDl8oxxj3obx5VtDw90=.a7ca2423-d87f-4db2-9d0d-523a0d58c90f@github.com> References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> <9wl7zcDsUD5im2gwdm-jtmLrgDl8oxxj3obx5VtDw90=.a7ca2423-d87f-4db2-9d0d-523a0d58c90f@github.com> Message-ID: On Tue, 5 Aug 2025 14:45:58 GMT, Samuel Chee wrote: > My proposal is: > > 1. For `cmpxchg`, we add a trailingDMB option, and emit if `!useLSE && trailingDMB`, moving the dmbs from outside to inside the method. Have default value for trailingDMB be false so other call sites won't emit this dmb hence won't be affected. I think it would be better to refactor things so that the intent is clear. better have `cmpxchg_barrier` and use that for C1. > 2. In a separate ticket, `cmpxchgptr` and `cmpxchgw` already have DMBs inside their method definitions, so add extra trailingDMB parameter defaulted to true. And emit dmb if true. Likewise. > 3. In a separate ticket, apply same logic to `atomic_##NAME` to move DMB inside function and default trailingDMB to false to not affect other call sites. Likewise. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2257304936 From duke at openjdk.org Wed Aug 6 14:10:29 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 14:10:29 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v19] In-Reply-To: References: Message-ID: <2Z6xyZxiD2iYkSwlvyO0HTm_TnAnb466MEtV0FrEPoc=.a583df35-654b-49cd-855a-d47fa124139a@github.com> > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: try m8 for grouping. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/7c5f24aa..a85abe7f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=17-18 Stats: 6 lines in 2 files changed: 0 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Wed Aug 6 14:10:29 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 14:10:29 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v18] In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 12:31:26 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > make powmax calculations scalar; re-use v_tmp for sum reduction operation. `a85abe7`: --- -XX:+UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 30 11.285 ? 0.011 ns/op ArraysHashCode.ints 5 avgt 30 21.297 ? 0.005 ns/op ArraysHashCode.ints 10 avgt 30 33.821 ? 0.007 ns/op ArraysHashCode.ints 20 avgt 30 58.884 ? 0.024 ns/op ArraysHashCode.ints 30 avgt 30 84.013 ? 0.094 ns/op ArraysHashCode.ints 40 avgt 30 109.038 ? 0.075 ns/op ArraysHashCode.ints 50 avgt 30 134.041 ? 0.038 ns/op ArraysHashCode.ints 60 avgt 30 159.348 ? 0.252 ns/op ArraysHashCode.ints 70 avgt 30 87.701 ? 0.049 ns/op ArraysHashCode.ints 80 avgt 30 109.611 ? 0.032 ns/op ArraysHashCode.ints 90 avgt 30 134.703 ? 0.078 ns/op ArraysHashCode.ints 100 avgt 30 159.367 ? 0.232 ns/op ArraysHashCode.ints 200 avgt 30 120.708 ? 0.223 ns/op ArraysHashCode.ints 300 avgt 30 229.257 ? 0.037 ns/op ArraysHashCode.ints 1000 avgt 30 397.151 ? 4.697 ns/op ArraysHashCode.ints 10000 avgt 30 5362.472 ? 19.957 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3160331579 From liach at openjdk.org Wed Aug 6 14:14:12 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 6 Aug 2025 14:14:12 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v7] In-Reply-To: References: Message-ID: <2Ex_m7rBZe1VrTerB0Q7dkL_kPdEtm-u0Y-IfC1CkCE=.65a6e1e3-f922-42ce-b0c4-4be4c36f0b63@github.com> On Wed, 6 Aug 2025 12:38:28 GMT, Quan Anh Mai wrote: >> Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: >> >> - Merge branch 'master' into enhance-clz-type >> - Add checks for results of all test methods >> - Replace `isa_*` with `is_*` and add checks for `Type::BOTTOM` >> - Merge branch 'master' into enhance-clz-type >> - Move `TestCountBitsRange` to `compiler.c2.gvn` >> - Fix null checks >> - Narrow type bound >> - Use `BitsPerX` constant instead of `sizeof` >> - Make the type of count leading/trailing zero nodes more precise > > src/hotspot/share/opto/countbitsnode.cpp line 57: > >> 55: >> 56: // If the input is BOTTOM, the result is the local BOTTOM. >> 57: if (t == Type::BOTTOM) { > > `t` should not be `Type::BOTTOM`, please remove this case. Maybe convert to an assert? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2257319939 From duke at openjdk.org Wed Aug 6 15:08:50 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 15:08:50 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v20] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: try m4 for grouping ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/a85abe7f..223e0a3d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=18-19 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Wed Aug 6 15:08:51 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 15:08:51 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v19] In-Reply-To: <2Z6xyZxiD2iYkSwlvyO0HTm_TnAnb466MEtV0FrEPoc=.a583df35-654b-49cd-855a-d47fa124139a@github.com> References: <2Z6xyZxiD2iYkSwlvyO0HTm_TnAnb466MEtV0FrEPoc=.a583df35-654b-49cd-855a-d47fa124139a@github.com> Message-ID: On Wed, 6 Aug 2025 14:10:29 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > try m8 for grouping. `223e0a3`: --- -XX:+UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 30 11.289 ? 0.017 ns/op ArraysHashCode.ints 5 avgt 30 21.295 ? 0.003 ns/op ArraysHashCode.ints 10 avgt 30 33.885 ? 0.051 ns/op ArraysHashCode.ints 20 avgt 30 58.866 ? 0.007 ns/op ArraysHashCode.ints 30 avgt 30 84.259 ? 0.120 ns/op ArraysHashCode.ints 40 avgt 30 65.178 ? 0.043 ns/op ArraysHashCode.ints 50 avgt 30 92.872 ? 0.170 ns/op ArraysHashCode.ints 60 avgt 30 116.742 ? 0.684 ns/op ArraysHashCode.ints 70 avgt 30 71.224 ? 0.225 ns/op ArraysHashCode.ints 80 avgt 30 95.184 ? 0.603 ns/op ArraysHashCode.ints 90 avgt 30 120.781 ? 0.079 ns/op ArraysHashCode.ints 100 avgt 30 72.659 ? 0.032 ns/op ArraysHashCode.ints 200 avgt 30 108.988 ? 0.036 ns/op ArraysHashCode.ints 300 avgt 30 150.753 ? 2.586 ns/op ArraysHashCode.ints 1000 avgt 30 330.159 ? 0.658 ns/op ArraysHashCode.ints 10000 avgt 30 5555.054 ? 45.951 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3160555632 From bmaillard at openjdk.org Wed Aug 6 15:45:00 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 6 Aug 2025 15:45:00 GMT Subject: RFR: 8349191: Test compiler/ciReplay/TestIncrementalInlining.java failed Message-ID: This PR fixes a bug caused by synchronization issues in the print inlining system. Individual segments of a single line of output are interleaved with output from other commpile threads, causing tests that parse replay files to fail. A snippet of a problematic replay file is shown below: @ 0 compiler.ciReplay.IncrementalInliningTest::level0 (4 bytes) force inline by annotation @ 0 compiler.ciReplay.IncrementalInliningTest::level1 (4 bytes) inline (hot) @ 0 compiler.ciReplay.IncrementalInliningTest::level2 (4 bytes) force inline by annotation @ 0 compiler.ciReplay.IncrementalInliningTest::late (4 bytes) force inline by annotation late inline succeeded @ 0 compiler.ciReplay.IncrementalInliningTest::level4 (6 bytes) failed to inline: inlining too deep This makes the output impossible to parse for tests like `compiler/ciReplay/TestIncrementalInlining.java`, as they rely on regular expressions to parse individual lines. Because it is a synchronization issue, the bug quite intermittent and I was only able to reproduce it with mach5 in tier 7. This bug was caused by [JDK-8319850](https://bugs.openjdk.org/browse/JDK-8319850), as it introduced important changes in the print inlining system. With these changes, individual segments of the output are printed directly to tty, and this risks causing problematic interleavings with multiple compile threads. My proposed solution is to simply print everything to a `stringStream` first, and then dump it to `tty`. The PR also removes the relevant tests from `ProblemList.txt`. ### Testing - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8349191) - [x] tier1-3, plus some internal testing - [x] tier7 for the relevant tests (`TestIncrementalInlining.java` and `TestInliningProtectionDomain.java`) ------------- Commit messages: - 8349191: Remove relevant tests from ProblemList - 8349191: Use stringStream to dump output and avoid interleaving Changes: https://git.openjdk.org/jdk/pull/26654/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26654&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349191 Stats: 7 lines in 2 files changed: 3 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26654.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26654/head:pull/26654 PR: https://git.openjdk.org/jdk/pull/26654 From duke at openjdk.org Wed Aug 6 16:17:02 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 16:17:02 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v21] In-Reply-To: References: Message-ID: <_ioi85CKYT5cMLcOTVu-N2EPoiXU8GMjr-DfDeTZ9Ak=.3e6a9dfc-33e4-445f-8c1c-08a8f6828046@github.com> > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: try m2 for grouping ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/223e0a3d..424a453c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=19-20 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Wed Aug 6 16:17:04 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 16:17:04 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v20] In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 15:08:50 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > try m4 for grouping `424a453`: --- -XX:+UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 30 11.277 ? 0.005 ns/op ArraysHashCode.ints 5 avgt 30 21.333 ? 0.032 ns/op ArraysHashCode.ints 10 avgt 30 33.850 ? 0.019 ns/op ArraysHashCode.ints 20 avgt 30 44.479 ? 0.015 ns/op ArraysHashCode.ints 30 avgt 30 69.189 ? 0.149 ns/op ArraysHashCode.ints 40 avgt 30 60.135 ? 0.049 ns/op ArraysHashCode.ints 50 avgt 30 53.870 ? 0.007 ns/op ArraysHashCode.ints 60 avgt 30 76.410 ? 0.015 ns/op ArraysHashCode.ints 70 avgt 30 67.745 ? 0.495 ns/op ArraysHashCode.ints 80 avgt 30 58.244 ? 0.008 ns/op ArraysHashCode.ints 90 avgt 30 79.313 ? 0.063 ns/op ArraysHashCode.ints 100 avgt 30 74.461 ? 1.249 ns/op ArraysHashCode.ints 200 avgt 30 122.614 ? 1.878 ns/op ArraysHashCode.ints 300 avgt 30 160.973 ? 0.069 ns/op ArraysHashCode.ints 1000 avgt 30 423.633 ? 11.864 ns/op ArraysHashCode.ints 10000 avgt 30 5938.320 ? 56.340 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3160779885 From duke at openjdk.org Wed Aug 6 17:02:50 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 17:02:50 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v22] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: try m1 for grouping ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/424a453c..60b2d815 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=20-21 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Wed Aug 6 17:07:14 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 17:07:14 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v22] In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 17:02:50 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > try m1 for grouping `60b2d81`: --- -XX:+UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 30 11.298 ? 0.021 ns/op ArraysHashCode.ints 5 avgt 30 21.475 ? 0.148 ns/op ArraysHashCode.ints 10 avgt 30 37.011 ? 0.057 ns/op ArraysHashCode.ints 20 avgt 30 47.146 ? 0.091 ns/op ArraysHashCode.ints 30 avgt 30 56.377 ? 0.021 ns/op ArraysHashCode.ints 40 avgt 30 55.116 ? 0.010 ns/op ArraysHashCode.ints 50 avgt 30 66.537 ? 0.457 ns/op ArraysHashCode.ints 60 avgt 30 72.110 ? 0.886 ns/op ArraysHashCode.ints 70 avgt 30 88.029 ? 0.897 ns/op ArraysHashCode.ints 80 avgt 30 77.739 ? 1.458 ns/op ArraysHashCode.ints 90 avgt 30 92.164 ? 1.118 ns/op ArraysHashCode.ints 100 avgt 30 104.852 ? 4.030 ns/op ArraysHashCode.ints 200 avgt 30 180.402 ? 0.037 ns/op ArraysHashCode.ints 300 avgt 30 234.598 ? 7.007 ns/op ArraysHashCode.ints 1000 avgt 30 806.796 ? 0.126 ns/op ArraysHashCode.ints 10000 avgt 30 8152.396 ? 79.158 ns/op Based on above experiments it looks reasonable to use `m2` grouping. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3160916148 PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3160919289 From duke at openjdk.org Wed Aug 6 19:18:06 2025 From: duke at openjdk.org (Tobias Hotz) Date: Wed, 6 Aug 2025 19:18:06 GMT Subject: RFR: 8364766: Improve Value() of DivI and DivL for non-constant inputs Message-ID: This PR improves the value of interger division nodes. Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. This also cleans up and unifies the code paths for DivINode and DivLNode. I've added some tests to validate the optimization. Without the changes, some of these tests fail. ------------- Commit messages: - Merge branch 'master' of https://github.com/openjdk/jdk into better_interger_div_type - Adjust bug number - Use LF line endings - Add verification to tests - Add tests and comments - Merge branch 'master' of https://github.com/openjdk/jdk into better_interger_div_type - Move more logic into generic computation - Return TOP if no value is possible - Merge remote-tracking branch 'refs/remotes/upstream/master' into better_interger_div_type - Merge remote-tracking branch 'origin/master' into better_interger_div_type - ... and 4 more: https://git.openjdk.org/jdk/compare/f95af744...dacaddac Changes: https://git.openjdk.org/jdk/pull/26143/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8364766 Stats: 450 lines in 2 files changed: 381 ins; 58 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/26143.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26143/head:pull/26143 PR: https://git.openjdk.org/jdk/pull/26143 From duke at openjdk.org Wed Aug 6 20:12:58 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 20:12:58 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v23] In-Reply-To: References: Message-ID: <-R7nfMcAVsCpaqdgAPEXf_ZQMa8ePRR3PufS-8Qt3qA=.3309bf6e-dfdd-41ec-8e17-c07abf60ebad@github.com> > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: choose m2 as fastest per experiments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/60b2d815..e14cc8e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=21-22 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Wed Aug 6 20:12:58 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 20:12:58 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v22] In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 17:02:50 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > try m1 for grouping --- -XX:DisableIntrinsic=_vectorizedHashCode --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 100 11.276 ? 0.002 ns/op ArraysHashCode.ints 5 avgt 100 28.939 ? 0.059 ns/op ArraysHashCode.ints 10 avgt 100 41.163 ? 0.156 ns/op ArraysHashCode.ints 20 avgt 100 68.272 ? 0.086 ns/op ArraysHashCode.ints 30 avgt 100 88.366 ? 0.225 ns/op ArraysHashCode.ints 40 avgt 100 115.092 ? 0.168 ns/op ArraysHashCode.ints 50 avgt 100 135.669 ? 0.271 ns/op ArraysHashCode.ints 60 avgt 100 162.028 ? 0.119 ns/op ArraysHashCode.ints 70 avgt 100 170.395 ? 0.251 ns/op ArraysHashCode.ints 80 avgt 100 194.108 ? 0.249 ns/op ArraysHashCode.ints 90 avgt 100 208.031 ? 0.147 ns/op ArraysHashCode.ints 100 avgt 100 232.727 ? 0.305 ns/op ArraysHashCode.ints 200 avgt 100 447.927 ? 0.512 ns/op ArraysHashCode.ints 300 avgt 100 655.105 ? 0.577 ns/op ArraysHashCode.ints 1000 avgt 100 2143.301 ? 1.763 ns/op ArraysHashCode.ints 10000 avgt 100 24249.479 ? 143.276 ns/op --- -XX:-UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 100 11.289 ? 0.010 ns/op ArraysHashCode.ints 5 avgt 100 23.176 ? 0.005 ns/op ArraysHashCode.ints 10 avgt 100 33.200 ? 0.030 ns/op ArraysHashCode.ints 20 avgt 100 50.746 ? 0.024 ns/op ArraysHashCode.ints 30 avgt 100 71.043 ? 0.132 ns/op ArraysHashCode.ints 40 avgt 100 88.473 ? 0.080 ns/op ArraysHashCode.ints 50 avgt 100 108.628 ? 0.100 ns/op ArraysHashCode.ints 60 avgt 100 126.217 ? 0.263 ns/op ArraysHashCode.ints 70 avgt 100 146.110 ? 0.087 ns/op ArraysHashCode.ints 80 avgt 100 163.683 ? 0.111 ns/op ArraysHashCode.ints 90 avgt 100 183.994 ? 0.324 ns/op ArraysHashCode.ints 100 avgt 100 201.342 ? 0.162 ns/op ArraysHashCode.ints 200 avgt 100 389.290 ? 0.254 ns/op ArraysHashCode.ints 300 avgt 100 576.945 ? 0.114 ns/op ArraysHashCode.ints 1000 avgt 100 1964.220 ? 50.893 ns/op ArraysHashCode.ints 10000 avgt 100 21526.738 ? 1176.571 ns/op --- -XX:+UseRVV --- Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.ints 1 avgt 100 11.290 ? 0.010 ns/op ArraysHashCode.ints 5 avgt 100 21.305 ? 0.011 ns/op ArraysHashCode.ints 10 avgt 100 33.819 ? 0.004 ns/op ArraysHashCode.ints 20 avgt 100 44.204 ? 0.117 ns/op ArraysHashCode.ints 30 avgt 100 69.280 ? 0.121 ns/op ArraysHashCode.ints 40 avgt 100 59.412 ? 0.222 ns/op ArraysHashCode.ints 50 avgt 100 52.665 ? 0.360 ns/op ArraysHashCode.ints 60 avgt 100 76.718 ? 0.382 ns/op ArraysHashCode.ints 70 avgt 100 66.035 ? 0.516 ns/op ArraysHashCode.ints 80 avgt 100 57.979 ? 0.440 ns/op ArraysHashCode.ints 90 avgt 100 83.142 ? 0.447 ns/op ArraysHashCode.ints 100 avgt 100 73.849 ? 0.438 ns/op ArraysHashCode.ints 200 avgt 100 116.300 ? 1.336 ns/op ArraysHashCode.ints 300 avgt 100 167.849 ? 2.394 ns/op ArraysHashCode.ints 1000 avgt 100 402.079 ? 9.159 ns/op ArraysHashCode.ints 10000 avgt 100 6200.543 ? 87.597 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3161440566 From duke at openjdk.org Wed Aug 6 20:13:16 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 6 Aug 2025 20:13:16 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v10] In-Reply-To: <18pF9Lub4DZhUqNaFVrHCVZ2kdocC4M2BYZ0CYVk-kk=.30c273bb-c04f-4427-a34c-1dde86456b3b@github.com> References: <18pF9Lub4DZhUqNaFVrHCVZ2kdocC4M2BYZ0CYVk-kk=.30c273bb-c04f-4427-a34c-1dde86456b3b@github.com> Message-ID: On Mon, 4 Aug 2025 13:37:33 GMT, Yuri Gaevsky wrote: >>> What's the performance look like with a smaller `lmul` (m1 or m2)? I am asking this because there are hardwares there (like SG2044) with a VLEN of 128 instead of 256 like on K1. >> >> Sure, I'll do it, thanks for the suggestion. > > Please see `m1` numbers [here](https://github.com/openjdk/jdk/pull/17413#issuecomment-3150754134). Please also see [this](https://github.com/openjdk/jdk/pull/17413#issuecomment-3161440566). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2258177037 From snatarajan at openjdk.org Wed Aug 6 21:45:42 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 6 Aug 2025 21:45:42 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled Message-ID: **Issue** An error, `assert(data->is_ReceiverTypeData()) failed: bad profile data type`, is encountered during C2 compilation due to bad profile data. This occurs when the code is compiled with `TypeProfileCasts` option disabled. **Analysis** The assertion failure occurs in `record_profiled_receiver_for_speculation` that analyzes the profiling information in the method data to determine whether a null value has been observed in the `instanceof` operation. This information is encoded in the `BitData` during profiling. When the method identifies that a null has been seen, it proceeds to inspect the associated `ReceiverTypeData` to see if the type check is always performed against null. However, in this scenario, the incoming profiling data is of type `BitData` rather than `ReceiverTypeData`, leading to the assertion failure. The profiling information for null seen for operations `aastore`, `instanceof`, and `checkcast` is recorded by the method `profile_null_seen `(in` src/hotspot/cpu/x86/templateTable_x86.cpp `). On investigating this method, it can be observed that the method data pointer is not updated for `VirtualCallData` (which is a subclass of `ReceiverTypeData`) when the `TypeProfileCasts` option is disabled. **Solution** My proposal is to inspect the `ReceiverTypeData` in function `record_profiled_receiver_for_speculation` only if `TypeProfileCasts` is enabled (this is based on the fact that the relevant method data pointer is not updated when `TypeProfileCasts` is disabled). **Question to reviewers** Do you think this is a reasonable fix ? **Testing** GitHub Actions tier1 to tier3 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. ------------- Commit messages: - Merge branch 'master' of https://github.com/sarannat/jdk into JDK-8358781 - Initial Fix Changes: https://git.openjdk.org/jdk/pull/26640/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26640&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358781 Stats: 11 lines in 1 file changed: 3 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/26640.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26640/head:pull/26640 PR: https://git.openjdk.org/jdk/pull/26640 From dlong at openjdk.org Thu Aug 7 01:14:25 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 7 Aug 2025 01:14:25 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v3] In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 13:19:33 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. >> >> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. >> >> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. >> >> Testing: >> - [ ] Github Actions >> - [ ] tier1, tier2 on all platforms >> - [ ] tier3, tier4 and Oracle internal testing on Linux fastdebug >> - [ ] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8308094-timeout > - No acquire release semantics > - Factor Linux specific timeout functionality out of share/ > - Move timeout disarm above if > - Merge branch 'master' into JDK-8308094-timeout > - Fix SIGALRM test > - Add timeout functionality to compiler threads Looks good. However, you might want to simply remove _timeout_armed, or put it inside a #ifdef ASSERT, since it is only used in an assert. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26023#pullrequestreview-3094752418 From qxing at openjdk.org Thu Aug 7 02:13:47 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 7 Aug 2025 02:13:47 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v8] In-Reply-To: References: Message-ID: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25928/files - new: https://git.openjdk.org/jdk/pull/25928/files/c9051a0b..ce5f8695 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=06-07 Stats: 68 lines in 2 files changed: 32 ins; 36 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From qxing at openjdk.org Thu Aug 7 02:18:18 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 7 Aug 2025 02:18:18 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v7] In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 12:38:05 GMT, Quan Anh Mai wrote: >> Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: >> >> - Merge branch 'master' into enhance-clz-type >> - Add checks for results of all test methods >> - Replace `isa_*` with `is_*` and add checks for `Type::BOTTOM` >> - Merge branch 'master' into enhance-clz-type >> - Move `TestCountBitsRange` to `compiler.c2.gvn` >> - Fix null checks >> - Narrow type bound >> - Use `BitsPerX` constant instead of `sizeof` >> - Make the type of count leading/trailing zero nodes more precise > > src/hotspot/share/opto/countbitsnode.cpp line 50: > >> 48: //------------------------------Value------------------------------------------ >> 49: const Type* CountLeadingZerosINode::Value(PhaseGVN* phase) const { >> 50: // If the input is TOP, the result is also TOP. > > IMO this comment is unnecessary, it is trivial to infer from the code below. Removed. > src/hotspot/share/opto/countbitsnode.cpp line 62: > >> 60: >> 61: const TypeInt* ti = t->is_int(); >> 62: if (ti->is_con()) { > > This is unnecessary, if `ti` is a constant then `~ti->_bits._zeros == ti->_bits._ones` and the below case will return a constant anyway. Removed, and added tests for constant input CLZ/CTZ nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2258707369 PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2258709866 From qxing at openjdk.org Thu Aug 7 02:18:19 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 7 Aug 2025 02:18:19 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v7] In-Reply-To: <2Ex_m7rBZe1VrTerB0Q7dkL_kPdEtm-u0Y-IfC1CkCE=.65a6e1e3-f922-42ce-b0c4-4be4c36f0b63@github.com> References: <2Ex_m7rBZe1VrTerB0Q7dkL_kPdEtm-u0Y-IfC1CkCE=.65a6e1e3-f922-42ce-b0c4-4be4c36f0b63@github.com> Message-ID: On Wed, 6 Aug 2025 14:11:40 GMT, Chen Liang wrote: >> src/hotspot/share/opto/countbitsnode.cpp line 57: >> >>> 55: >>> 56: // If the input is BOTTOM, the result is the local BOTTOM. >>> 57: if (t == Type::BOTTOM) { >> >> `t` should not be `Type::BOTTOM`, please remove this case. > > Maybe convert to an assert? Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2258707446 From qxing at openjdk.org Thu Aug 7 02:18:20 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 7 Aug 2025 02:18:20 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v7] In-Reply-To: <_tJxPUHoJrERm4Yxb3ituFLhPv9PKVQ3K6ufXQWCk2Y=.de11d41b-f2d6-4c3b-88a9-e79013815aeb@github.com> References: <_tJxPUHoJrERm4Yxb3ituFLhPv9PKVQ3K6ufXQWCk2Y=.de11d41b-f2d6-4c3b-88a9-e79013815aeb@github.com> Message-ID: On Wed, 6 Aug 2025 10:42:42 GMT, Jatin Bhateja wrote: >> Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: >> >> - Merge branch 'master' into enhance-clz-type >> - Add checks for results of all test methods >> - Replace `isa_*` with `is_*` and add checks for `Type::BOTTOM` >> - Merge branch 'master' into enhance-clz-type >> - Move `TestCountBitsRange` to `compiler.c2.gvn` >> - Fix null checks >> - Narrow type bound >> - Use `BitsPerX` constant instead of `sizeof` >> - Make the type of count leading/trailing zero nodes more precise > > src/hotspot/share/opto/countbitsnode.cpp line 89: > >> 87: return TypeInt::make(count_leading_zeros_long(~tl->_bits._zeros), >> 88: count_leading_zeros_long(tl->_bits._ones), >> 89: tl->_widen); > > Hi @MaxXSoft, Taking the liberty to add some comments related to the proof of the above assumptions, hope you won't mind :-) > > **Z3 proof for [KnownBits.ONES, ~KnownBits.ZEROS] >= [LB, UB] where >= is a superset relation.** > > > from z3 import * > from functools import reduce > > # 64-bit symbolic unsigned integers > UB = BitVec('UB', 64) > LB = BitVec('LB', 64) > > # XOR for detecting differing bits > xor_val = UB ^ LB > > # COUNT_LEADING_ZEROS > def count_leading_zeros(x): > """Returns number of leading zeros in a 64-bit word.""" > clauses = [] > for i in range(64): > bit_is_one = LShR(x, 63 - i) & 1 == 1 > bits_before_zero = And([LShR(x, 63 - j) & 1 == 0 for j in range(i)]) > clauses.append(If(And(bits_before_zero, bit_is_one), BitVecVal(i, 64), BitVecVal(64, 64))) > return reduce(lambda a, b: If(a == 64, b, a), clauses) > > # Step 1: Compute common prefix length > CPL = count_leading_zeros(xor_val) > > # Step 2: COMMON_PREFIX_MASK = ((1 << CPL) - 1) << (64 - CPL) > one_shifted = (BitVecVal(1, 64) << CPL) > mask = one_shifted - 1 > COMMON_PREFIX_MASK = mask << (64 - CPL) > > # Step 3: COMMON_PREFIX > COMMON_PREFIX = UB & COMMON_PREFIX_MASK > > # Step 4: ZEROS and ONES > ZEROS = COMMON_PREFIX_MASK & (~COMMON_PREFIX) > ONES = COMMON_PREFIX > > # Step 5: Prove that [ONES, ~ZEROS] ? [LB, UB] > prop = And(ULE(ONES, LB), ULE(UB, ~ZEROS)) > > # Step 6: Try to disprove (i.e., check if any UB, LB violates the above) > s = Solver() > s.add(Not(prop)) # Look for counterexamples > > # Check the result > if s.check() == sat: > m = s.model() > ub_val = m.eval(UB).as_long() > lb_val = m.eval(LB).as_long() > ones_val = m.eval(ONES).as_long() > zeros_val = m.eval(ZEROS).as_long() > not_zeros_val = (~zeros_val) & 0xFFFFFFFFFFFFFFFF > > print("? Property does NOT hold. Counterexample found:") > print(f"LB = {lb_val:#018x}") > print(f"UB = {ub_val:#018x}") > print(f"ONES = {ones_val:#018x}") > print(f"~ZEROS = {not_zeros_val:#018x}") > print(f"ONES <= LB? {ones_val <= lb_val}") > print(f"UB <= ~ZEROS? {ub_val <= not_zeros_val}") > else: > print("? Property holds: [ONES, ~ZEROS] always covers [LB, UB] (UNSAT)") > > > > image References: <_tJxPUHoJrERm4Yxb3ituFLhPv9PKVQ3K6ufXQWCk2Y=.de11d41b-f2d6-4c3b-88a9-e79013815aeb@github.com> Message-ID: On Thu, 7 Aug 2025 02:13:16 GMT, Qizheng Xing wrote: >> src/hotspot/share/opto/countbitsnode.cpp line 89: >> >>> 87: return TypeInt::make(count_leading_zeros_long(~tl->_bits._zeros), >>> 88: count_leading_zeros_long(tl->_bits._ones), >>> 89: tl->_widen); >> >> Hi @MaxXSoft, Taking the liberty to add some comments related to the proof of the above assumptions, hope you won't mind :-) >> >> **Z3 proof for [KnownBits.ONES, ~KnownBits.ZEROS] >= [LB, UB] where >= is a superset relation.** >> >> >> from z3 import * >> from functools import reduce >> >> # 64-bit symbolic unsigned integers >> UB = BitVec('UB', 64) >> LB = BitVec('LB', 64) >> >> # XOR for detecting differing bits >> xor_val = UB ^ LB >> >> # COUNT_LEADING_ZEROS >> def count_leading_zeros(x): >> """Returns number of leading zeros in a 64-bit word.""" >> clauses = [] >> for i in range(64): >> bit_is_one = LShR(x, 63 - i) & 1 == 1 >> bits_before_zero = And([LShR(x, 63 - j) & 1 == 0 for j in range(i)]) >> clauses.append(If(And(bits_before_zero, bit_is_one), BitVecVal(i, 64), BitVecVal(64, 64))) >> return reduce(lambda a, b: If(a == 64, b, a), clauses) >> >> # Step 1: Compute common prefix length >> CPL = count_leading_zeros(xor_val) >> >> # Step 2: COMMON_PREFIX_MASK = ((1 << CPL) - 1) << (64 - CPL) >> one_shifted = (BitVecVal(1, 64) << CPL) >> mask = one_shifted - 1 >> COMMON_PREFIX_MASK = mask << (64 - CPL) >> >> # Step 3: COMMON_PREFIX >> COMMON_PREFIX = UB & COMMON_PREFIX_MASK >> >> # Step 4: ZEROS and ONES >> ZEROS = COMMON_PREFIX_MASK & (~COMMON_PREFIX) >> ONES = COMMON_PREFIX >> >> # Step 5: Prove that [ONES, ~ZEROS] ? [LB, UB] >> prop = And(ULE(ONES, LB), ULE(UB, ~ZEROS)) >> >> # Step 6: Try to disprove (i.e., check if any UB, LB violates the above) >> s = Solver() >> s.add(Not(prop)) # Look for counterexamples >> >> # Check the result >> if s.check() == sat: >> m = s.model() >> ub_val = m.eval(UB).as_long() >> lb_val = m.eval(LB).as_long() >> ones_val = m.eval(ONES).as_long() >> zeros_val = m.eval(ZEROS).as_long() >> not_zeros_val = (~zeros_val) & 0xFFFFFFFFFFFFFFFF >> >> print("? Property does NOT hold. Counterexample found:") >> print(f"LB = {lb_val:#018x}") >> print(f"UB = {ub_val:#018x}") >> print(f"ONES = {ones_val:#018x}") >> print(f"~ZEROS = {not_zeros_val:#018x}") >> print(f"ONES <= LB? {ones_val <= lb_val}") >> print(f"UB <= ~ZEROS? {ub_val <= not_zeros_val}") >> else: >> print("? Property holds: [ONES, ~ZEROS] always covers [LB, UB] (UNS... > > Thanks for the addition! Very nice and solid proof. Hi @MaxXSoft, Constant folding optimizations like these qualify for addition of a new benchmark. Please add one; I see a significant gain with following micro kernel on my Intel 13th Gen Intel(R) Core(TM) i3-1315U laptop. public static long micro(long param) { long constrained_param = Math.min(175, Math.max(param, 160)); return Long.numberOfLeadingZeros(constrained_param); } PROMPT>#baseline PROMPT>java -Xbatch -XX:-TieredCompilation -cp . test_clz [time] 17ms [res] 11200000 PROMPT>#withopt PROMPT>java -Xbatch -XX:-TieredCompilation -cp . test_clz [time] 5ms [res] 11200000 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2258958363 From qamai at openjdk.org Thu Aug 7 04:41:16 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 7 Aug 2025 04:41:16 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v8] In-Reply-To: References: Message-ID: On Thu, 7 Aug 2025 02:13:47 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary code Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3095237759 From jsjolen at openjdk.org Thu Aug 7 07:15:15 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 7 Aug 2025 07:15:15 GMT Subject: RFR: 8352067: Remove the NMT treap and replace its uses with the utilities red-black tree [v2] In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 11:07:42 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The utilities red-black tree and the NMT treap serve similar functions. Given the red-black tree's versatility and stricter time complexity, the treap can be removed in favour of it. >> >> I made some modifications to the red-black tree to make it compatible with previous treap usages: >> - Updated the `visit_in_order` and `visit_range_in_order` functions to require the supplied callback to return a bool, which allows us to stop traversing early. >> - Improved const-correctness by ensuring that invoking these functions on a const reference provides const pointers to nodes, while non-const references provide mutable pointers. Previously the two functions behaved differently. >> >> Changes to NMT include: >> - Modified components to align with the updated const-correctness of the red-black tree functions >> - Renamed structures and variables to remove "treap" from their names to reflect the new tree >> >> The treap was also used in one place in C2. I changed this to use the red-black tree and its cursor interface, which I felt was most fitting for the use case. >> >> Testing: >> - Oracle tiers 1-3 > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > feedback fixes Still OK ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26655#pullrequestreview-3095690422 From mhaessig at openjdk.org Thu Aug 7 07:40:19 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 7 Aug 2025 07:40:19 GMT Subject: RFR: 8349191: Test compiler/ciReplay/TestIncrementalInlining.java failed In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 09:14:36 GMT, Beno?t Maillard wrote: > This PR fixes a bug caused by synchronization issues in the print inlining system. Individual segments of a single line of output are interleaved with output from other commpile threads, causing tests that parse replay files to fail. > > A snippet of a problematic replay file is shown below: > > > @ 0 compiler.ciReplay.IncrementalInliningTest::level0 (4 bytes) force inline by annotation > @ 0 compiler.ciReplay.IncrementalInliningTest::level1 (4 bytes) inline (hot) > @ 0 compiler.ciReplay.IncrementalInliningTest::level2 (4 bytes) > > > > force inline by annotation > @ 0 compiler.ciReplay.IncrementalInliningTest::late (4 bytes) force inline by annotation late inline succeeded > @ 0 compiler.ciReplay.IncrementalInliningTest::level4 (6 bytes) failed to inline: inlining too deep > > > This makes the output impossible to parse for tests like `compiler/ciReplay/TestIncrementalInlining.java`, as they rely on regular expressions to parse individual lines. Because it is a synchronization issue, the bug quite intermittent and I was only able to reproduce it with mach5 in tier 7. > > This bug was caused by [JDK-8319850](https://bugs.openjdk.org/browse/JDK-8319850), as it introduced important changes in the print inlining system. With these changes, individual segments of the output are printed directly to tty, and this risks causing problematic interleavings with multiple compile threads. > > My proposed solution is to simply print everything to a `stringStream` first, and then dump it to `tty`. The PR also removes the relevant tests from `ProblemList.txt`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8349191) > - [x] tier1-3, plus some internal testing > - [x] tier7 for the relevant tests (`TestIncrementalInlining.java` and `TestInliningProtectionDomain.java`) Thank you for fixing this pesky issue, @benoitmaillard! Looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26654#pullrequestreview-3095824387 From qamai at openjdk.org Thu Aug 7 08:00:23 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 7 Aug 2025 08:00:23 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v4] In-Reply-To: <_sSUlLFhpG8Ton-bIB3u6Nf7YSxb8LQNzngDDLqrwcA=.5c456420-a5bd-406b-8cea-e6d2ac8d74c9@github.com> References: <_sSUlLFhpG8Ton-bIB3u6Nf7YSxb8LQNzngDDLqrwcA=.5c456420-a5bd-406b-8cea-e6d2ac8d74c9@github.com> Message-ID: On Thu, 3 Jul 2025 03:27:32 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a small patch that improves the implementation of Value() for `AbsINode` and `AbsLNode` by returning the absolute value of the input range. Most of the logic is trivial except for the special case where `_lo == jint_min/jlong_min` which must return the entire type range when encountered, for which I've added a small proof in the comments. I've also added some unit tests and updated the file to limit IR check platforms with more granularity. >> >> Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Code review and constant folding test > - Merge > - Replace uabs usage with ABS > - Merge branch 'master' into abs-value > - Merge > - Improve AbsNode::Value With unsigned bounds, you can simply do: juint umin = MIN2(uabs(t->_ulo), uabs(t->_uhi)); juint umax = MAX2(uabs(t->_lo), uabs(t->_hi)); return TypeInt::make_unsigned(umin, umax, t->_widen); The proof can be inferred trivially from the property of `TypeInt` (you can find this in the doc of `TypeInt`). Since the set of values of a `TypeInt` looks like this: smin ---------- lo ===== uhi ------ 0 -----ulo ========= hi --------- smax or (in this case `lo == ulo`, `hi == uhi`) smin ----------- lo ======= hi --- 0 ------------------------------------- smax or (in this case `lo == ulo`, `hi == uhi`) smin --------------------------------- 0 ------- lo ========= hi -------- smax You can see that in all 3 cases, the minimum of the uabs of a value is either `uabs(ulo)` or `uabs(uhi)` and the maximum is either `uabs(lo)` or `uabs(hi)`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23685#issuecomment-3162975167 From dfenacci at openjdk.org Thu Aug 7 08:11:17 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 7 Aug 2025 08:11:17 GMT Subject: RFR: 8349191: Test compiler/ciReplay/TestIncrementalInlining.java failed In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 09:14:36 GMT, Beno?t Maillard wrote: > This PR fixes a bug caused by synchronization issues in the print inlining system. Individual segments of a single line of output are interleaved with output from other commpile threads, causing tests that parse replay files to fail. > > A snippet of a problematic replay file is shown below: > > > @ 0 compiler.ciReplay.IncrementalInliningTest::level0 (4 bytes) force inline by annotation > @ 0 compiler.ciReplay.IncrementalInliningTest::level1 (4 bytes) inline (hot) > @ 0 compiler.ciReplay.IncrementalInliningTest::level2 (4 bytes) > > > > force inline by annotation > @ 0 compiler.ciReplay.IncrementalInliningTest::late (4 bytes) force inline by annotation late inline succeeded > @ 0 compiler.ciReplay.IncrementalInliningTest::level4 (6 bytes) failed to inline: inlining too deep > > > This makes the output impossible to parse for tests like `compiler/ciReplay/TestIncrementalInlining.java`, as they rely on regular expressions to parse individual lines. Because it is a synchronization issue, the bug quite intermittent and I was only able to reproduce it with mach5 in tier 7. > > This bug was caused by [JDK-8319850](https://bugs.openjdk.org/browse/JDK-8319850), as it introduced important changes in the print inlining system. With these changes, individual segments of the output are printed directly to tty, and this risks causing problematic interleavings with multiple compile threads. > > My proposed solution is to simply print everything to a `stringStream` first, and then dump it to `tty`. The PR also removes the relevant tests from `ProblemList.txt`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8349191) > - [x] tier1-3, plus some internal testing > - [x] tier7 for the relevant tests (`TestIncrementalInlining.java` and `TestInliningProtectionDomain.java`) LGTM too. Thanks @benoitmaillard! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/26654#pullrequestreview-3095932441 From bkilambi at openjdk.org Thu Aug 7 08:24:49 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 7 Aug 2025 08:24:49 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v2] In-Reply-To: References: Message-ID: > After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - > `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - > > > public void vectorAddConstInputFloat16() { > for (int i = 0; i < LEN; ++i) { > output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); > } > } > > > > > > The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. > > This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). > > Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' into JDK-8361582 - 8361582: AArch64: Some ConH values cannot be replicated with SVE After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java fails for some of the tests which contain constant values such as - public void vectorAddConstInputFloat16() { for (int i = 0; i < LEN; ++i) { output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); } } The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). ------------- Changes: https://git.openjdk.org/jdk/pull/26589/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=01 Stats: 194 lines in 7 files changed: 170 ins; 4 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/26589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26589/head:pull/26589 PR: https://git.openjdk.org/jdk/pull/26589 From bkilambi at openjdk.org Thu Aug 7 08:27:33 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 7 Aug 2025 08:27:33 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v3] In-Reply-To: References: Message-ID: > After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - > `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - > > > public void vectorAddConstInputFloat16() { > for (int i = 0; i < LEN; ++i) { > output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); > } > } > > > > > > The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. > > This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). > > Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Addressed review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26589/files - new: https://git.openjdk.org/jdk/pull/26589/files/30d82f85..a44eccc0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=01-02 Stats: 13 lines in 4 files changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/26589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26589/head:pull/26589 PR: https://git.openjdk.org/jdk/pull/26589 From bkilambi at openjdk.org Thu Aug 7 08:27:33 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 7 Aug 2025 08:27:33 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v3] In-Reply-To: References: Message-ID: On Thu, 7 Aug 2025 08:24:02 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 4377: >> >>> 4375: operand immI8_shift8() >>> 4376: %{ >>> 4377: predicate(Assembler::operand_valid_for_sve_dup_immediate((int64_t)n->get_int())); >> >> `Assembler::operand_valid_for_sve_dup_immediate` sounds odd as the predicate for a generically sounding `immI8_shift8`. These operands are only used in `replicate` rules, though. So we might be taking precedent from immIAddSubV` rule: >> >> >> // 32 bit integer valid for vector add sub immediate >> operand immIAddSubV() >> %{ >> predicate(Assembler::operand_valid_for_sve_add_sub_immediate((int64_t)n->get_int())); >> match(ConI); >> >> op_cost(0); >> format %{ %} >> interface(CONST_INTER); >> %} >> >> >> I.e. rename these operands to `immIDupV`, `immLDupV`, `immHDupV` and adjust the comments to match? > > Done Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2259529883 From bkilambi at openjdk.org Thu Aug 7 08:27:34 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 7 Aug 2025 08:27:34 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v3] In-Reply-To: References: Message-ID: On Fri, 1 Aug 2025 09:51:21 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 4903: > >> 4901: >> 4902: // Replicate a 16-bit half precision float which is within the limits >> 4903: // as specified for the operand - immH8_shift8 > > Suggestion: > > // for the operand - immH8_shift8 Done > src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 439: > >> 437: bool Assembler::operand_valid_for_sve_dup_immediate(int64_t imm) { >> 438: return ((imm <= 127 && imm >= -128) || >> 439: (imm <= 32767 && imm >= -32768 && (imm & 0xff) == 0)); > > Suggestion: > > return ((imm >= -128 && imm <= 127) || > (imm & 0xff == 0) && (imm >= -32768 && imm <= 32767)); > > Reason: it's more conventional, and closer to the mathematical _l ? x ? h_. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2259530180 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2259529427 From bkilambi at openjdk.org Thu Aug 7 08:27:33 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 7 Aug 2025 08:27:33 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v3] In-Reply-To: References: Message-ID: On Fri, 1 Aug 2025 11:50:13 GMT, Aleksey Shipilev wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > src/hotspot/cpu/aarch64/aarch64.ad line 4377: > >> 4375: operand immI8_shift8() >> 4376: %{ >> 4377: predicate(Assembler::operand_valid_for_sve_dup_immediate((int64_t)n->get_int())); > > `Assembler::operand_valid_for_sve_dup_immediate` sounds odd as the predicate for a generically sounding `immI8_shift8`. These operands are only used in `replicate` rules, though. So we might be taking precedent from immIAddSubV` rule: > > > // 32 bit integer valid for vector add sub immediate > operand immIAddSubV() > %{ > predicate(Assembler::operand_valid_for_sve_add_sub_immediate((int64_t)n->get_int())); > match(ConI); > > op_cost(0); > format %{ %} > interface(CONST_INTER); > %} > > > I.e. rename these operands to `immIDupV`, `immLDupV`, `immHDupV` and adjust the comments to match? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2259528646 From dlong at openjdk.org Thu Aug 7 08:47:19 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 7 Aug 2025 08:47:19 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v5] In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 02:54:22 GMT, Kim Barrett wrote: >> Thanks for implementing nice code for PPC64! I appreciate it! The shared code and the other platforms look fine, too. >> Maybe atomic bitwise operations could be used, but I'm happy with your current solution. > >> Thanks @TheRealMDoerr . I didn't even consider atomic bitwise operations, but that's a good idea. I'm not in a hurry to push this, so if you could provide an atomic bitwise patch for ppc64, I would be happy to include it. In the mean time, I'm still investigating the ZGC regression. If I can figure it out, I might want to include a fix for ZGC in this PR as well. > > Not a review, just a drive-by comment. > We've had Atomic bitops for a while now. > Atomic::fetch_then_{and,or,xor}(ptr, bits [, order]) > Atomic::{and,or,xor}_then_fetch(ptr, bits [, order]) > They haven't been optimized for most (any?) platforms, being based on cmpxchg. > (See all the "Specialize atomic bitset functions for ..." related to > https://bugs.openjdk.org/browse/JDK-8293117.) Thanks @kimbarrett. I see that the Atomic::fetch_then_XXX() implementation is very similar to what I came up with. The operation I'm doing is sometimes setting a single bit, so fetch_then_or() could be used, but sometimes the operation is setting the other 31 bits, so a new fetch_then_set_with_mask() would need to be added. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26399#issuecomment-3163131928 From chagedorn at openjdk.org Thu Aug 7 08:53:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 7 Aug 2025 08:53:15 GMT Subject: RFR: 8349191: Test compiler/ciReplay/TestIncrementalInlining.java failed In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 09:14:36 GMT, Beno?t Maillard wrote: > This PR fixes a bug caused by synchronization issues in the print inlining system. Individual segments of a single line of output are interleaved with output from other commpile threads, causing tests that parse replay files to fail. > > A snippet of a problematic replay file is shown below: > > > @ 0 compiler.ciReplay.IncrementalInliningTest::level0 (4 bytes) force inline by annotation > @ 0 compiler.ciReplay.IncrementalInliningTest::level1 (4 bytes) inline (hot) > @ 0 compiler.ciReplay.IncrementalInliningTest::level2 (4 bytes) > > > > force inline by annotation > @ 0 compiler.ciReplay.IncrementalInliningTest::late (4 bytes) force inline by annotation late inline succeeded > @ 0 compiler.ciReplay.IncrementalInliningTest::level4 (6 bytes) failed to inline: inlining too deep > > > This makes the output impossible to parse for tests like `compiler/ciReplay/TestIncrementalInlining.java`, as they rely on regular expressions to parse individual lines. Because it is a synchronization issue, the bug quite intermittent and I was only able to reproduce it with mach5 in tier 7. > > This bug was caused by [JDK-8319850](https://bugs.openjdk.org/browse/JDK-8319850), as it introduced important changes in the print inlining system. With these changes, individual segments of the output are printed directly to tty, and this risks causing problematic interleavings with multiple compile threads. > > My proposed solution is to simply print everything to a `stringStream` first, and then dump it to `tty`. The PR also removes the relevant tests from `ProblemList.txt`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8349191) > - [x] tier1-3, plus some internal testing > - [x] tier7 for the relevant tests (`TestIncrementalInlining.java` and `TestInliningProtectionDomain.java`) > > Thanks for reviewing! Looks good, thanks for deep diving into this in order to revive the test! src/hotspot/share/opto/printinlining.cpp line 52: > 50: stringStream ss; > 51: _root.dump(&ss, -1); > 52: tty->print_raw(ss.freeze()); General thought: I see that we use the proposed pattern to print a `stringStream` in existing code but also a different pattern with `as_string()`: https://github.com/openjdk/jdk/blob/c56fb0b6eff7d3f36bc65f300b784e0dd73c563e/src/hotspot/share/opto/compile.cpp#L614 Can anybody comment on which one that should be preferred? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26654#pullrequestreview-3096083746 PR Review Comment: https://git.openjdk.org/jdk/pull/26654#discussion_r2259591574 From bmaillard at openjdk.org Thu Aug 7 09:03:20 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 7 Aug 2025 09:03:20 GMT Subject: RFR: 8349191: Test compiler/ciReplay/TestIncrementalInlining.java failed In-Reply-To: References: Message-ID: On Thu, 7 Aug 2025 08:47:24 GMT, Christian Hagedorn wrote: >> This PR fixes a bug caused by synchronization issues in the print inlining system. Individual segments of a single line of output are interleaved with output from other commpile threads, causing tests that parse replay files to fail. >> >> A snippet of a problematic replay file is shown below: >> >> >> @ 0 compiler.ciReplay.IncrementalInliningTest::level0 (4 bytes) force inline by annotation >> @ 0 compiler.ciReplay.IncrementalInliningTest::level1 (4 bytes) inline (hot) >> @ 0 compiler.ciReplay.IncrementalInliningTest::level2 (4 bytes) >> >> >> >> force inline by annotation >> @ 0 compiler.ciReplay.IncrementalInliningTest::late (4 bytes) force inline by annotation late inline succeeded >> @ 0 compiler.ciReplay.IncrementalInliningTest::level4 (6 bytes) failed to inline: inlining too deep >> >> >> This makes the output impossible to parse for tests like `compiler/ciReplay/TestIncrementalInlining.java`, as they rely on regular expressions to parse individual lines. Because it is a synchronization issue, the bug quite intermittent and I was only able to reproduce it with mach5 in tier 7. >> >> This bug was caused by [JDK-8319850](https://bugs.openjdk.org/browse/JDK-8319850), as it introduced important changes in the print inlining system. With these changes, individual segments of the output are printed directly to tty, and this risks causing problematic interleavings with multiple compile threads. >> >> My proposed solution is to simply print everything to a `stringStream` first, and then dump it to `tty`. The PR also removes the relevant tests from `ProblemList.txt`. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8349191) >> - [x] tier1-3, plus some internal testing >> - [x] tier7 for the relevant tests (`TestIncrementalInlining.java` and `TestInliningProtectionDomain.java`) >> >> Thanks for reviewing! > > src/hotspot/share/opto/printinlining.cpp line 52: > >> 50: stringStream ss; >> 51: _root.dump(&ss, -1); >> 52: tty->print_raw(ss.freeze()); > > General thought: I see that we use the proposed pattern to print a `stringStream` in existing code but also a different pattern with `as_string()`: > https://github.com/openjdk/jdk/blob/c56fb0b6eff7d3f36bc65f300b784e0dd73c563e/src/hotspot/share/opto/compile.cpp#L614 > > Can anybody comment on which one that should be preferred? Good point, I am also curious to see the answer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26654#discussion_r2259628413 From fbredberg at openjdk.org Thu Aug 7 09:23:37 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 7 Aug 2025 09:23:37 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v4] In-Reply-To: References: Message-ID: > Since the integration of [JDK-8359437](https://bugs.openjdk.org/browse/JDK-8359437) the `LockingMode` flag can no longer be set by the user, instead it's declared as `const int LockingMode = LM_LIGHTWEIGHT;`. This means that we can now safely remove all `LockingMode` related code from all platforms. > > This PR removes `LockingMode` related code from the **x86** platform. > > When all the `LockingMode` code has been removed from all platforms, we can go on and remove it from shared (non-platform specific) files as well. And finally remove the `LockingMode` variable itself. > > Passes tier1-tier5 with no added problems. Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: Update three after review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26552/files - new: https://git.openjdk.org/jdk/pull/26552/files/9fa0c947..6c8b78b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26552&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26552&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26552.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26552/head:pull/26552 PR: https://git.openjdk.org/jdk/pull/26552 From fbredberg at openjdk.org Thu Aug 7 09:23:37 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 7 Aug 2025 09:23:37 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v3] In-Reply-To: References: <-ncfIHskHiKnUbJ3nRR8rp678hInGalmZW4CnS5QJp0=.baabffb7-5f4f-4f06-9b23-315f8e9372a7@github.com> Message-ID: On Wed, 6 Aug 2025 05:43:13 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update two after review > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 409: > >> 407: // >> 408: // The only other source of unbalanced locking would be JNI. The "Java Native Interface >> 409: // Specification" states that an object locked by JNI's_MonitorEnter should not be > > Suggestion: > > // Specification" states that an object locked by JNI's MonitorEnter should not be > > Sorry missed the misplaced underscore due to the red-wavy-line spelling error Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26552#discussion_r2259672572 From fferrari at openjdk.org Thu Aug 7 10:44:51 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Thu, 7 Aug 2025 10:44:51 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type Message-ID: Hi, this pull request is a second take of 1383fec41756322bf2832c55633e46395b937b40, by updating the `CmpUNode` type as either `TypeInt::CC_LE` (case 1a) or `TypeInt::CC_LT` (case 1b) instead of updating the `BoolNode` type as `TypeInt::ONE`. With this approach a56cd371a2c497e4323756f8b8a08a0bba059bf2 becomes unnecessary. Additionally, having the right type in `CmpUNode` could potentially enable further optimizations. #### Testing In order to evaluate the changes, the following testing has been performed: * `jdk:tier1` (see [GitHub Actions run](https://github.com/franferrax/jdk/actions/runs/16789994433)) * [`TestBoolNodeGVN.java`](https://github.com/openjdk/jdk/blob/jdk-26+9/test/hotspot/jtreg/compiler/c2/gvn/TestBoolNodeGVN.java), created for [JDK-8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value](https://bugs.openjdk.org/browse/JDK-8327381) (1383fec41756322bf2832c55633e46395b937b40) * I also checked it breaks if I remove the `CmpUNode::Value_cmpu_and_mask` call * Private reproducer for [JDK-8349584: Improve compiler processing](https://bugs.openjdk.org/browse/JDK-8349584) (a56cd371a2c497e4323756f8b8a08a0bba059bf2) * A local slowdebug run of the `test/hotspot/jtreg/compiler/c2` category on _Fedora Linux x86_64_ * Same results as with `master` (f95af744b07a9ec87e2507b3d584cbcddc827bbd) ------------- Commit messages: - 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type - Revert "8349584: Improve compiler processing" Changes: https://git.openjdk.org/jdk/pull/26666/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26666&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8364970 Stats: 227 lines in 4 files changed: 80 ins; 145 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26666/head:pull/26666 PR: https://git.openjdk.org/jdk/pull/26666 From fferrari at openjdk.org Thu Aug 7 10:44:51 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Thu, 7 Aug 2025 10:44:51 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 23:33:23 GMT, Francisco Ferrari Bihurriet wrote: > Hi, this pull request is a second take of 1383fec41756322bf2832c55633e46395b937b40, by updating the `CmpUNode` type as either `TypeInt::CC_LE` (case 1a) or `TypeInt::CC_LT` (case 1b) instead of updating the `BoolNode` type as `TypeInt::ONE`. > > With this approach a56cd371a2c497e4323756f8b8a08a0bba059bf2 becomes unnecessary. Additionally, having the right type in `CmpUNode` could potentially enable further optimizations. > > #### Testing > > In order to evaluate the changes, the following testing has been performed: > > * `jdk:tier1` (see [GitHub Actions run](https://github.com/franferrax/jdk/actions/runs/16789994433)) > * [`TestBoolNodeGVN.java`](https://github.com/openjdk/jdk/blob/jdk-26+9/test/hotspot/jtreg/compiler/c2/gvn/TestBoolNodeGVN.java), created for [JDK-8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value](https://bugs.openjdk.org/browse/JDK-8327381) (1383fec41756322bf2832c55633e46395b937b40) > * I also checked it breaks if I remove the `CmpUNode::Value_cmpu_and_mask` call > * Private reproducer for [JDK-8349584: Improve compiler processing](https://bugs.openjdk.org/browse/JDK-8349584) (a56cd371a2c497e4323756f8b8a08a0bba059bf2) > * A local slowdebug run of the `test/hotspot/jtreg/compiler/c2` category on _Fedora Linux x86_64_ > * Same results as with `master` (f95af744b07a9ec87e2507b3d584cbcddc827bbd) @rwestrel / @TobiHartmann / @chhagedorn: this is my first contribution in C2 besides [OJVG](https://openjdk.org/groups/vulnerability/) reviews and backports, please let me know if I should be testing something else. @tabjy: as the original 1383fec41756322bf2832c55633e46395b937b40 author, I would greatly appreciate an additional review from you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26666#issuecomment-3163534821 From mhaessig at openjdk.org Thu Aug 7 11:08:19 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 7 Aug 2025 11:08:19 GMT Subject: RFR: 8364766: Improve Value() of DivI and DivL for non-constant inputs In-Reply-To: References: Message-ID: On Sun, 6 Jul 2025 08:08:25 GMT, Tobias Hotz wrote: > This PR improves the value of interger division nodes. > Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case > We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. > This also cleans up and unifies the code paths for DivINode and DivLNode. > I've added some tests to validate the optimization. Without the changes, some of these tests fail. Thank you for contributing this enhancement, @ichttt! Your code is very well commented and thus easy to follow, and you tested your changes thoroughly. Nice work! I mainly nitpicked on a few details below. But I have one question: You went through great pains to only calculate the necessary "corners". Would it not be much easier to calculate all four possible corners and let the min and max functions deal with the duplicates in case the `i1` or `i2` range is a singleton? The result should be the same (if I did not forget about a corner case) and it would be easier to follow. What kind of testing did you run on your side? I kicked off a tier1 through 5 and will keep you posted on the results. You still have a title mismatch between the issue and the PR (the PR is missing "C2:"). src/hotspot/share/opto/divnode.cpp line 507: > 505: } > 506: > 507: Suggestion: src/hotspot/share/opto/divnode.cpp line 539: > 537: // We compute all four and take the min and max. > 538: // A special case handles overflow when dividing the most?negative value by ?1. > 539: Suggestion: src/hotspot/share/opto/divnode.cpp line 548: > 546: assert(min_val == min_jint || min_val == min_jlong, "min has to be either min_jint or min_jlong"); > 547: > 548: // Special overflow case: min_val / (-1) == min_val Suggestion: // Special overflow case: min_val / (-1) == min_val (cf. JVMS?6.5 idiv/ldiv) Since it is explicitly mentioned in the [spec](https://docs.oracle.com/javase/specs/jvms/se24/html/jvms-6.html#jvms-6.5.idiv), you may want to add a reference to it. src/hotspot/share/opto/divnode.cpp line 551: > 549: // We must include min_val in the output if i1->_lo == min_val and i2->_hi. > 550: if (i1->_lo == min_val && i2_hi == -1) { > 551: // special case: min_jint or min_jlong div -1 == min_val Suggestion: With the comments above the `if`, this is superfluous. src/hotspot/share/opto/divnode.cpp line 554: > 552: new_lo = i1->_lo; > 553: if (!i1->is_con()) { > 554: // Also compute the ?next? division result for a non?constant range. Suggestion: // Also compute the "next" division result for a non?constant range. Nit: let's stick to ASCII for the quotes :) src/hotspot/share/opto/divnode.cpp line 568: > 566: if (i2_lo != i2_hi) { > 567: // special case not possible here, _lo mus > 568: assert(i2_lo != -1, "Special case not possible here"); While functionally correct, here you are only talking about the negative special case, but if `i2_lo in [0,1]` the same might happen on the positive side. Suggestion: // If the divisor range is wider than a singleton, include (i1->_lo, i2->_lo). // We cannot use is_con here, as a range of [-1, 0] for i2_hi and [0,1] for i2_lo // will also result in i2_lo and i2_hi being -1, or i2_lo and i2_hi being 1 // respectively. if (i2_lo != i2_hi) { assert(i2_hi - i2_lo >= 1, "i2 must be wider that a singleton"); src/hotspot/share/opto/divnode.cpp line 575: > 573: > 574: // If i1 is not a single constant, include the two corners with i1->_hi: > 575: // (i1->_hi, i2->_lo) and (i1->_hi, i2->_hi) Why do you not have to handle the case of `i2` being a singleton range? test/hotspot/jtreg/compiler/c2/irTests/IntegerDivValueTests.java line 56: > 54: public int testIntConstantFoldingSpecialCase() { > 55: // All constants available during parsing > 56: return Integer.MIN_VALUE / -1; It would be good to check the result of this computation, since it is a special case. test/hotspot/jtreg/compiler/c2/irTests/IntegerDivValueTests.java line 125: > 123: @Run(test = {"testIntRange", "testIntRange2", "testIntRange3", "testIntRange4", "testIntRange5", "testIntRange6", "testIntRange7", "testIntRange8"}) > 124: public void checkIntRanges(RunInfo info) { > 125: Suggestion: test/hotspot/jtreg/compiler/c2/irTests/IntegerDivValueTests.java line 173: > 171: > 172: // Long variants > 173: Suggestion: // Long variants test/hotspot/jtreg/compiler/c2/irTests/IntegerDivValueTests.java line 254: > 252: @Run(test = {"testLongRange", "testLongRange2", "testLongRange3", "testLongRange4", "testLongRange5", "testLongRange6", "testLongRange7", "testLongRange8"}) > 253: public void checkLongRanges(RunInfo info) { > 254: Suggestion: ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3095836322 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2259419349 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2259563833 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2259601794 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2259609385 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2259614381 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2259692276 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2259879642 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2259891388 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2259902412 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2259906531 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2259905197 From mhaessig at openjdk.org Thu Aug 7 11:41:15 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 7 Aug 2025 11:41:15 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled In-Reply-To: References: Message-ID: On Tue, 5 Aug 2025 10:40:19 GMT, Saranya Natarajan wrote: > **Issue** > An error, `assert(data->is_ReceiverTypeData()) failed: bad profile data type`, is encountered during C2 compilation due to bad profile data. This occurs when the code is compiled with `TypeProfileCasts` option disabled. > > **Analysis** > The assertion failure occurs in `record_profiled_receiver_for_speculation` that analyzes the profiling information in the method data to determine whether a null value has been observed in the `instanceof` operation. This information is encoded in the `BitData` during profiling. When the method identifies that a null has been seen, it proceeds to inspect the associated `ReceiverTypeData` to see if the type check is always performed against null. However, in this scenario, the incoming profiling data is of type `BitData` rather than `ReceiverTypeData`, leading to the assertion failure. > > The profiling information for null seen for operations `aastore`, `instanceof`, and `checkcast` is recorded by the method `profile_null_seen `(in` src/hotspot/cpu/x86/templateTable_x86.cpp `). On investigating this method, it can be observed that the method data pointer is not updated for `VirtualCallData` (which is a subclass of `ReceiverTypeData`) when the `TypeProfileCasts` option is disabled. > > **Solution** > My proposal is to inspect the `ReceiverTypeData` in function `record_profiled_receiver_for_speculation` only if `TypeProfileCasts` is enabled (this is based on the fact that the relevant method data pointer is not updated when `TypeProfileCasts` is disabled). > > **Question to reviewers** > Do you think this is a reasonable fix ? > > **Testing** > GitHub Actions > tier1 to tier3 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. Thank you for working on this, @sarannat! The fix seems reasonable to me since `GraphKit::maybe_cast_profiled_receiver` has a similar exception. However, you are missing a regression test or a `noreg-*` label in JBS. However, in this case, I think a small regression test is warranted. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26640#pullrequestreview-3096696793 From dholmes at openjdk.org Thu Aug 7 12:16:16 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 7 Aug 2025 12:16:16 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v4] In-Reply-To: References: Message-ID: On Thu, 7 Aug 2025 09:23:37 GMT, Fredrik Bredberg wrote: >> Since the integration of [JDK-8359437](https://bugs.openjdk.org/browse/JDK-8359437) the `LockingMode` flag can no longer be set by the user, instead it's declared as `const int LockingMode = LM_LIGHTWEIGHT;`. This means that we can now safely remove all `LockingMode` related code from all platforms. >> >> This PR removes `LockingMode` related code from the **x86** platform. >> >> When all the `LockingMode` code has been removed from all platforms, we can go on and remove it from shared (non-platform specific) files as well. And finally remove the `LockingMode` variable itself. >> >> Passes tier1-tier5 with no added problems. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update three after review Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26552#pullrequestreview-3096874113 From jbhateja at openjdk.org Thu Aug 7 12:37:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 7 Aug 2025 12:37:17 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v2] In-Reply-To: References: Message-ID: <98s1CQXUZuyBYN83myqlz01lNsEw3o7-v1DdVb3cNv4=.705802ff-2a68-4258-8f2b-fe5885ce32c5@github.com> On Fri, 25 Jul 2025 20:09:40 GMT, Jatin Bhateja wrote: >> Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. >> It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. >> >> Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). >> >> Vector API jtreg tests pass at AVX level 2, remaining validation in progress. >> >> Performance numbers: >> >> >> System : 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms >> VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms >> VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms >> VectorSliceB... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating predicate checks Adding additional notes on implementation: A) Slice:- 1. New inline expander and C2 IR node VectorSlice for leaf level intrinsic corresponding to Vector.slice(int) 2. Other interfaces of slice APIs. - Vector.slice(int, Vector) - The second vector argument is the background vector, which replaces the zero broadcasted vector of the base version of API. - API internally calls the same intrinsic entry point as the base version. - Vector.slice(int, Vector, VectorMask) - This version of the API internally calls the above slice API with index and vector arguments, followed by an explicit blend with a broadcasted zero vector. Thus, current support implicitly covers all three 3 variants of slice APIs. B) Similar extensions to optimize Unslice with constant index:- 1. Similar to slice, unslice also has three interfaces. 2. Leaf-level interface only accepts an index argument. 3. Other variants of unslice accept unslice index, background vector, and part number. 4. We can assume the receiver vector to be sliding over two contiguously placed background vectors. 5. It's possible to implement all three variants of unslice using slice operations as follows. jshell> // Input jshell> vec vec ==> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16] jshell> vec2 vec2 ==> [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160] jshell> bzvec bzvec ==> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] jshell> // Case 1: jshell> vec.unslice(4) $79 ==> [0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] jshell> bzvec.slice(vec.length() - 4, vec) $80 ==> [0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] jshell> // Case 2: jshell> vec.unslice(4, vec2, 0) $81 ==> [10, 20, 30, 40, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] jshell> vec2.blend(vec2.slice(vec2.length() - 4, vec), VectorMask.fromLong(IntVector.SPECIES_512, ((1L << (vec.length() - 4)) - 1) << 4)) $82 ==> [10, 20, 30, 40, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] jshell> // Case 3: jshell> vec.unslice(4, vec2, 1) $83 ==> [13, 14, 15, 16, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160] jshell> vec2.blend(vec.slice(vec.length() - 4, vec2), VectorMask.fromLong(IntVector.SPECIES_512, ((1L << 4) - 1))) $84 ==> [13, 14, 15, 16, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160] jshell> // Case 4: jshell> vec.unslice(4, vec2, 0, VectorMask.fromLong(IntVector.SPECIES_512, 0xFF)) $85 ==> [10, 20, 30, 40, 1, 2, 3, 4, 5, 6, 7, 8, 130, 140, 150, 160] jshell> // Current Java fallback implementation for this version is based on slice and unslice operations. To ease the review process, I plan to optimize the unslice API with a constant index by extending the newly added expander in a follow-up patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24104#issuecomment-3163994472 From bkilambi at openjdk.org Thu Aug 7 13:22:17 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 7 Aug 2025 13:22:17 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE In-Reply-To: References: <0jcw428unzAfdGcqci79xBRxjw3yHN_MxYc7OOuHDz8=.31bd3357-49ff-442f-8d06-58447df49de7@github.com> Message-ID: On Fri, 1 Aug 2025 14:29:47 GMT, Aleksey Shipilev wrote: >>> I am still a bit confused what matches `Replicate` with `immH` that does _not_ fit `immH8_shift8` when `Matcher::vector_length_in_bytes(n) > 16`? >> >> Hi, thanks for your review. If the immediate value does not fit `immH8_shift8` for `Matcher::vector_length_in_bytes(n) > 16` , the compiler would generate `loadConH` [1] -> `replicateHF` [2] backend nodes instead. The constant would be loaded from the constant pool instead and then broadcasted/replicated to every lane of an SVE register. >> >> [1] https://github.com/openjdk/jdk/blob/8ac4a88f3c5ad57824dd192cb3f0af5e71cbceeb/src/hotspot/cpu/aarch64/aarch64.ad#L6963 >> >> [2] https://github.com/openjdk/jdk/blob/8ac4a88f3c5ad57824dd192cb3f0af5e71cbceeb/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4806 > >> If the immediate value does not fit `immH8_shift8` for `Matcher::vector_length_in_bytes(n) > 16` , the compiler would generate `loadConH` [1] -> `replicateHF` [2] backend nodes instead. > > Ah OK, just checking. I ran this patch on the machine where I have originally found the issue, and it seems to work. Hi @shipilev @theRealAph Can I please ask for another round of review? I have addressed the review comments in the latest patch. Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3164177758 From shade at openjdk.org Thu Aug 7 14:01:23 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 7 Aug 2025 14:01:23 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v3] In-Reply-To: References: Message-ID: On Thu, 7 Aug 2025 08:27:33 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 439: > 437: bool Assembler::operand_valid_for_sve_dup_immediate(int64_t imm) { > 438: return ((imm >= -128 && imm <= 127) || > 439: (((imm & 0xff) == 0) && imm >= -32768 && imm <= 32767)); Hold up! The current predicate was: predicate((n->get_long() <= 127 && n->get_long() >= -128) || (n->get_long() <= 32512 && n->get_long() >= -32768 && (n->get_long() & 0xff) == 0)); So the upper bound is _not_ `32767`, but `32512`. Maybe that actually matches the `0xff` mask, I have not checked. But SVE spec talks about `+32512`, so it looks more straightforward just to match that. test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 52: > 50: > 51: // Choose FP16_CONST1 which is within the range of [-128 << 8, 127 << 8] and a multiple of 256 > 52: private static final Float16 FP16_CONST1 = Float16.shortBitsToFloat16((short)512); Call them `FP16_IN_RANGE` and `FP16_OUT_OF_RANGE`, maybe? Also rename the test cases from `*1`/`*2` to `*InRange`/`*OutOfRange`? test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 68: > 66: > 67: Generator gen = G.float16s(); > 68: IntStream.range(0, LEN).forEach(i -> {input[i] = gen.next();}); Just do a for loop? test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 79: > 77: @IR(counts = {IRNode.REPLICATE_HF_IMM8, ">0"}, > 78: phase = CompilePhase.FINAL_CODE, > 79: applyIf = {"MaxVectorSize", ">=32"}, `> 16` then? This matches the comment better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2260421042 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2260414898 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2260403263 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2260416873 From mhaessig at openjdk.org Thu Aug 7 14:52:48 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 7 Aug 2025 14:52:48 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v4] In-Reply-To: References: Message-ID: <7SKfaULZBs_ccRipoMMWXKUAASHIhq9um43xaxToBKE=.83db680e-fc44-4be9-8f15-0030e764b4f8@github.com> > This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. > > The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. > > Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. > > Testing: > - [ ] Github Actions > - [ ] tier1, tier2 on all platforms > - [ ] tier3, tier4 and Oracle internal testing on Linux fastdebug > - [ ] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: ASSERT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26023/files - new: https://git.openjdk.org/jdk/pull/26023/files/d50231f9..212afb4d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=02-03 Stats: 18 lines in 2 files changed: 12 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/26023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26023/head:pull/26023 PR: https://git.openjdk.org/jdk/pull/26023 From mhaessig at openjdk.org Thu Aug 7 14:52:48 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 7 Aug 2025 14:52:48 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v3] In-Reply-To: References: Message-ID: <3dpJoVc2WphcDziUngOFHUf6Th74DAizG9lIclUwPCc=.655a1ba0-9fae-495b-9431-f8a2a43c0073@github.com> On Thu, 7 Aug 2025 01:11:14 GMT, Dean Long wrote: > However, you might want to simply remove _timeout_armed, or put it inside a #ifdef ASSERT, since it is only used in an assert. Good point, thank you. v3 does just that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26023#issuecomment-3164517696 From shade at openjdk.org Thu Aug 7 15:01:15 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 7 Aug 2025 15:01:15 GMT Subject: RFR: 8358598: PhaseIterGVN::PhaseIterGVN(PhaseGVN* gvn) doesn't use its parameter [v2] In-Reply-To: References: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> Message-ID: On Tue, 5 Aug 2025 14:09:46 GMT, Francesco Andreuzzi wrote: >> As noted in the ticket, I propose a small cleanup of `PhaseIterGVN` since one of the constructors does not use its parameter. >> >> Passes tier1 and tier2. > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > align with line above > > Co-authored-by: Manuel H?ssig Looks reasonable. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26617#pullrequestreview-3097555097 From shade at openjdk.org Thu Aug 7 15:04:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 7 Aug 2025 15:04:14 GMT Subject: RFR: 8360304: Redundant condition in LibraryCallKit::inline_vector_nary_operation In-Reply-To: References: Message-ID: On Sat, 2 Aug 2025 15:44:22 GMT, Francesco Andreuzzi wrote: > The check for `sopc != 0` is not needed after JDK-8353786, the function would exit at L374 otherwise. > > Passes tier1. Looks fine, but I think you want to run vector tests to be extra sure. https://github.com/openjdk/jdk/blob/83953c458eb65b2af184340dd460325f2b56e5b9/test/jdk/TEST.groups#L402-L403 ------------- PR Review: https://git.openjdk.org/jdk/pull/26606#pullrequestreview-3097566971 From duke at openjdk.org Thu Aug 7 15:09:17 2025 From: duke at openjdk.org (duke) Date: Thu, 7 Aug 2025 15:09:17 GMT Subject: RFR: 8358598: PhaseIterGVN::PhaseIterGVN(PhaseGVN* gvn) doesn't use its parameter [v2] In-Reply-To: References: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> Message-ID: On Tue, 5 Aug 2025 14:09:46 GMT, Francesco Andreuzzi wrote: >> As noted in the ticket, I propose a small cleanup of `PhaseIterGVN` since one of the constructors does not use its parameter. >> >> Passes tier1 and tier2. > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > align with line above > > Co-authored-by: Manuel H?ssig @fandreuz Your change (at version a5553aa06502143d3ef527e8152cd651a453d3d3) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26617#issuecomment-3164590167 From bkilambi at openjdk.org Thu Aug 7 15:27:19 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 7 Aug 2025 15:27:19 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v3] In-Reply-To: References: Message-ID: On Thu, 7 Aug 2025 13:56:49 GMT, Aleksey Shipilev wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 439: > >> 437: bool Assembler::operand_valid_for_sve_dup_immediate(int64_t imm) { >> 438: return ((imm >= -128 && imm <= 127) || >> 439: (((imm & 0xff) == 0) && imm >= -32768 && imm <= 32767)); > > Hold up! The current predicate was: > > > predicate((n->get_long() <= 127 && n->get_long() >= -128) || > (n->get_long() <= 32512 && n->get_long() >= -32768 && (n->get_long() & 0xff) == 0)); > > > So the upper bound is _not_ `32767`, but `32512`. Maybe that actually matches the `0xff` mask, I have not checked. But SVE spec talks about `+32512`, so it looks more straightforward just to match that. Sure I can do that. Yes, the SVE spec talks specifically about `+32512` but I used `32767` as the largest value divisible by 256 would be `32512` anyway (and `-32768` and `32767` looked a bit more logical for a 16-bit immediate). I don't have much of a preference on this though and will go by your suggestion. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2260668682 From qamai at openjdk.org Thu Aug 7 15:38:20 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 7 Aug 2025 15:38:20 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v2] In-Reply-To: References: Message-ID: On Fri, 25 Jul 2025 20:09:40 GMT, Jatin Bhateja wrote: >> Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. >> It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. >> >> Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). >> >> Vector API jtreg tests pass at AVX level 2, remaining validation in progress. >> >> Performance numbers: >> >> >> System : 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms >> VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms >> VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms >> VectorSliceB... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating predicate checks src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7138: > 7136: // res[255:128] = {src2[127:0] , src1[255:128]} >> SHIFT > 7137: vperm2i128(xtmp, src1, src2, 0x21); > 7138: vpalignr(dst, xtmp, src1, origin, Assembler::AVX_256bit); If the slice amount is exactly 16, I think the `vpalignr` is unnecessary. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7159: > 7157: void C2_MacroAssembler::vector_slice_64B_op(XMMRegister dst, XMMRegister src1, XMMRegister src2, > 7158: XMMRegister xtmp, int origin, int vlen_enc) { > 7159: if (origin <= 16) { If `origin` is divisible by `4`, then a single `valignd` is enough, am I right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2260699543 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2260702518 From duke at openjdk.org Thu Aug 7 15:47:24 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Thu, 7 Aug 2025 15:47:24 GMT Subject: Integrated: 8358598: PhaseIterGVN::PhaseIterGVN(PhaseGVN* gvn) doesn't use its parameter In-Reply-To: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> References: <2S2UiCOxUCiSAlQrrVCaL4S6MYlqdRcabqniskhg6XI=.c4ec617e-da35-48df-911c-9c0b4dca0126@github.com> Message-ID: On Mon, 4 Aug 2025 09:47:23 GMT, Francesco Andreuzzi wrote: > As noted in the ticket, I propose a small cleanup of `PhaseIterGVN` since one of the constructors does not use its parameter. > > Passes tier1 and tier2. This pull request has now been integrated. Changeset: e606278f Author: Francesco Andreuzzi Committer: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/e606278fc8929fe563dd50a1c3f332747e210276 Stats: 17 lines in 4 files changed: 0 ins; 5 del; 12 mod 8358598: PhaseIterGVN::PhaseIterGVN(PhaseGVN* gvn) doesn't use its parameter Reviewed-by: galder, mhaessig, shade ------------- PR: https://git.openjdk.org/jdk/pull/26617 From duke at openjdk.org Thu Aug 7 18:57:20 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Thu, 7 Aug 2025 18:57:20 GMT Subject: RFR: 8360304: Redundant condition in LibraryCallKit::inline_vector_nary_operation In-Reply-To: References: Message-ID: On Thu, 7 Aug 2025 15:01:37 GMT, Aleksey Shipilev wrote: > Looks fine, but I think you want to run vector tests to be extra sure. > > https://github.com/openjdk/jdk/blob/83953c458eb65b2af184340dd460325f2b56e5b9/test/jdk/TEST.groups#L402-L403 ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR SKIP jtreg:test/jdk:jdk_vector 83 81 0 0 2 ============================== TEST SUCCESS ------------- PR Comment: https://git.openjdk.org/jdk/pull/26606#issuecomment-3165372292 From dlong at openjdk.org Thu Aug 7 18:59:36 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 7 Aug 2025 18:59:36 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v4] In-Reply-To: <7SKfaULZBs_ccRipoMMWXKUAASHIhq9um43xaxToBKE=.83db680e-fc44-4be9-8f15-0030e764b4f8@github.com> References: <7SKfaULZBs_ccRipoMMWXKUAASHIhq9um43xaxToBKE=.83db680e-fc44-4be9-8f15-0030e764b4f8@github.com> Message-ID: On Thu, 7 Aug 2025 14:52:48 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. >> >> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. >> >> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. >> >> Testing: >> - [ ] Github Actions >> - [ ] tier1, tier2 on all platforms >> - [ ] tier3, tier4 and Oracle internal testing on Linux fastdebug >> - [ ] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > ASSERT Thinking about _timeout_armed a little more, the fact the the signal handler received TIMEOUT_SIGNAL should be enough. The value of _timeout_armed should be redundant, and your assert could be changed to: assert(false, "compile task timed out"); and _timeout_armed could be removed. It's just an inexact mirror of the timer state. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26023#issuecomment-3165377812 From dskantz at openjdk.org Fri Aug 8 06:20:15 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Fri, 8 Aug 2025 06:20:15 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors Message-ID: This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 Testing: T1-4. Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. ------------- Commit messages: - PL - fix Changes: https://git.openjdk.org/jdk/pull/26685/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26685&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8362394 Stats: 95 lines in 3 files changed: 94 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26685.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26685/head:pull/26685 PR: https://git.openjdk.org/jdk/pull/26685 From kbarrett at openjdk.org Fri Aug 8 06:42:13 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 8 Aug 2025 06:42:13 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v5] In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 02:54:22 GMT, Kim Barrett wrote: >> Thanks for implementing nice code for PPC64! I appreciate it! The shared code and the other platforms look fine, too. >> Maybe atomic bitwise operations could be used, but I'm happy with your current solution. > >> Thanks @TheRealMDoerr . I didn't even consider atomic bitwise operations, but that's a good idea. I'm not in a hurry to push this, so if you could provide an atomic bitwise patch for ppc64, I would be happy to include it. In the mean time, I'm still investigating the ZGC regression. If I can figure it out, I might want to include a fix for ZGC in this PR as well. > > Not a review, just a drive-by comment. > We've had Atomic bitops for a while now. > Atomic::fetch_then_{and,or,xor}(ptr, bits [, order]) > Atomic::{and,or,xor}_then_fetch(ptr, bits [, order]) > They haven't been optimized for most (any?) platforms, being based on cmpxchg. > (See all the "Specialize atomic bitset functions for ..." related to > https://bugs.openjdk.org/browse/JDK-8293117.) > Thanks @kimbarrett. I see that the Atomic::fetch_then_XXX() implementation is very similar to what I came up with. The operation I'm doing is sometimes setting a single bit, so fetch_then_or() could be used, but sometimes the operation is setting the other 31 bits, so a new fetch_then_set_with_mask() would need to be added. Oh, yes, I see. This code is setting a bitfield. Yeah, that's not one of the logical atomic primitives, and seems unlikely to be added unless more use-cases can be found. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26399#issuecomment-3166743405 From qxing at openjdk.org Fri Aug 8 08:21:42 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 8 Aug 2025 08:21:42 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v9] In-Reply-To: References: Message-ID: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: - Add microbench - Add missing test method declarations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25928/files - new: https://git.openjdk.org/jdk/pull/25928/files/ce5f8695..b4b9b643 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=07-08 Stats: 78 lines in 2 files changed: 74 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From qxing at openjdk.org Fri Aug 8 08:24:13 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 8 Aug 2025 08:24:13 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v9] In-Reply-To: References: Message-ID: On Fri, 8 Aug 2025 08:21:42 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: > > - Add microbench > - Add missing test method declarations Hi @jatin-bhateja, I've added a micro benchmark that includes the `numberOfNibbles` implementation from this PR description and your micro kernel. Here's my test results on an Intel(R) Xeon(R) Platinum: # Baseline: Benchmark Mode Cnt Score Error Units CountLeadingZeros.benchClzLongConstrained avgt 15 1517.888 ? 5.691 ns/op CountLeadingZeros.benchNumberOfNibbles avgt 15 1094.422 ? 1.753 ns/op # This patch: Benchmark Mode Cnt Score Error Units CountLeadingZeros.benchClzLongConstrained avgt 15 0.948 ? 0.002 ns/op CountLeadingZeros.benchNumberOfNibbles avgt 15 942.438 ? 1.742 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3166981089 From dfenacci at openjdk.org Fri Aug 8 08:52:10 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 8 Aug 2025 08:52:10 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled In-Reply-To: References: Message-ID: On Tue, 5 Aug 2025 10:40:19 GMT, Saranya Natarajan wrote: > **Issue** > An error, `assert(data->is_ReceiverTypeData()) failed: bad profile data type`, is encountered during C2 compilation due to bad profile data. This occurs when the code is compiled with `TypeProfileCasts` option disabled. > > **Analysis** > The assertion failure occurs in `record_profiled_receiver_for_speculation` that analyzes the profiling information in the method data to determine whether a null value has been observed in the `instanceof` operation. This information is encoded in the `BitData` during profiling. When the method identifies that a null has been seen, it proceeds to inspect the associated `ReceiverTypeData` to see if the type check is always performed against null. However, in this scenario, the incoming profiling data is of type `BitData` rather than `ReceiverTypeData`, leading to the assertion failure. > > The profiling information for null seen for operations `aastore`, `instanceof`, and `checkcast` is recorded by the method `profile_null_seen `(in` src/hotspot/cpu/x86/templateTable_x86.cpp `). On investigating this method, it can be observed that the method data pointer is not updated for `VirtualCallData` (which is a subclass of `ReceiverTypeData`) when the `TypeProfileCasts` option is disabled. > > **Solution** > My proposal is to inspect the `ReceiverTypeData` in function `record_profiled_receiver_for_speculation` only if `TypeProfileCasts` is enabled (this is based on the fact that the relevant method data pointer is not updated when `TypeProfileCasts` is disabled). > > **Question to reviewers** > Do you think this is a reasonable fix ? > > **Testing** > GitHub Actions > tier1 to tier3 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. Thanks for fixing this @sarannat. Apart from the missing regression test already mentioned by @mhaessig the fix looks good to me. Just a quick question: did you try to run some testing with `-XX:-TypeProfileCasts`? ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/26640#pullrequestreview-3099980060 From mhaessig at openjdk.org Fri Aug 8 10:51:42 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 8 Aug 2025 10:51:42 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v5] In-Reply-To: References: Message-ID: <6gq4iIBw4RIqqPvmAf2MHnKrmYHwOdWdH1fz1bFaCGA=.57906956-460f-4a1d-9e3e-fbf91a7974e2@github.com> > This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. > > The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. > > Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 on all platforms > - [ ] tier3, tier4 and Oracle internal testing on Linux fastdebug > - [ ] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'master' into JDK-8308094-timeout - Rename _timer - remove _timeout_armed - ASSERT - Merge branch 'master' into JDK-8308094-timeout - No acquire release semantics - Factor Linux specific timeout functionality out of share/ - Move timeout disarm above if - Merge branch 'master' into JDK-8308094-timeout - Fix SIGALRM test - ... and 1 more: https://git.openjdk.org/jdk/compare/80c8bd84...8bb5eb7a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26023/files - new: https://git.openjdk.org/jdk/pull/26023/files/212afb4d..8bb5eb7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=03-04 Stats: 4769 lines in 139 files changed: 3372 ins; 855 del; 542 mod Patch: https://git.openjdk.org/jdk/pull/26023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26023/head:pull/26023 PR: https://git.openjdk.org/jdk/pull/26023 From shade at openjdk.org Fri Aug 8 12:45:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 8 Aug 2025 12:45:00 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask Message-ID: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> See the bug for more investigation. In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. One mistake I did myself right with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first deletes, then asks for `next` (facepalms). Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. Additional testing: - [x] Linux AArch64 server fastdebug, reproducer no longer fails - [x] Linux AArch64 server fastdebug, `compiler` - [ ] Linux AArch64 server fastdebug, `all` ------------- Commit messages: - Comment touchups - Touchups - Fix Changes: https://git.openjdk.org/jdk/pull/26696/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26696&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8364501 Stats: 24 lines in 3 files changed: 16 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/26696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26696/head:pull/26696 PR: https://git.openjdk.org/jdk/pull/26696 From mhaessig at openjdk.org Fri Aug 8 14:35:11 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 8 Aug 2025 14:35:11 GMT Subject: RFR: 8364766: Improve Value() of DivI and DivL for non-constant inputs In-Reply-To: References: Message-ID: On Sun, 6 Jul 2025 08:08:25 GMT, Tobias Hotz wrote: > This PR improves the value of interger division nodes. > Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case > We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. > This also cleans up and unifies the code paths for DivINode and DivLNode. > I've added some tests to validate the optimization. Without the changes, some of these tests fail. Testing is all green. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3168157330 From mablakatov at openjdk.org Fri Aug 8 14:40:01 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Fri, 8 Aug 2025 14:40:01 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v9] In-Reply-To: References: Message-ID: > Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. > > Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. > > The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms > IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms > LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms > > > Fujitsu A64FX (SVE 512-bit): > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms > IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms > LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Address review comments and simplify the implementation - remove the loops from gt128b methods making them 256b only - fixup: missed fnoregs in instruct reduce_mulL_256b - use an extra vtmp3 reg for the 256b integer method - remove a no longer needed change in reduce_mul_integral_le128b - cleanup: unify comments - Merge commit '8193856af8546332bfa180cb45154a4093b4fd2c' - remove the strictly-ordered FP implementation as unused - Compare VL against MaxVectorSize instead of FloatRegister::sve_vl_max - Use a dedicated ptrue predicate register This shifts MulReduction performance on Neoverse V1 a bit. Here Before if before this specific commit (ebad6dd37e332da44222c50cd17c69f3ff3f0635) and After is this commit. | Benchmark | Before (ops/ms) | After (ops/ms) | Diff (%) | | ------------------------ | --------------- | -------------- | -------- | | ByteMaxVector.MULLanes | 9883.151 | 9093.557 | -7.99% | | DoubleMaxVector.MULLanes | 2712.674 | 2607.367 | -3.89% | | FloatMaxVector.MULLanes | 3388.811 | 3291.429 | -2.88% | | IntMaxVector.MULLanes | 4765.554 | 5031.741 | +5.58% | | LongMaxVector.MULLanes | 2685.228 | 2896.445 | +7.88% | | ShortMaxVector.MULLanes | 5128.185 | 5197.656 | +1.35% | - cleanup: update a copyright notice Co-authored-by: Hao Sun - fixup: remove undefined insts from aarch64-asmtest.py - cleanup: address nits, rename several symbols - cleanup: remove unreferenced definitions - Address review comments. - fixup: disable FP mul reduction auto-vectorization for all targets - fixup: add a tmp vReg to reduce_mul_integral_gt128b and reduce_non_strict_order_mul_fp_gt128bto keep vsrc unmodified - cleanup: replace a complex lambda in the above methods with a loop - cleanup: rename symbols to follow the existing naming convention - cleanup: add asserts to SVE only instructions - split mul FP reduction instructions into strictly-ordered (default) and explicitly non strictly-ordered - remove redundant conditions in TestVectorFPReduction.java Benchmarks results: Neoverse-V1 (SVE 256-bit) | Benchmark | Before | After | Units | Diff | |---------------------------|----------|----------|--------|-------| | ByteMaxVector.MULLanes | 619.156 | 9884.578 | ops/ms | 1496% | | DoubleMaxVector.MULLanes | 184.693 | 2712.051 | ops/ms | 1368% | | FloatMaxVector.MULLanes | 277.818 | 3388.038 | ops/ms | 1119% | | IntMaxVector.MULLanes | 371.225 | 4765.434 | ops/ms | 1183% | | LongMaxVector.MULLanes | 205.149 | 2672.975 | ops/ms | 1203% | | ShortMaxVector.MULLanes | 472.804 | 5122.917 | ops/ms | 984% | - ... and 5 more: https://git.openjdk.org/jdk/compare/8193856a...5b06b638 ------------- Changes: https://git.openjdk.org/jdk/pull/23181/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23181&range=08 Stats: 383 lines in 9 files changed: 236 ins; 2 del; 145 mod Patch: https://git.openjdk.org/jdk/pull/23181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23181/head:pull/23181 PR: https://git.openjdk.org/jdk/pull/23181 From mablakatov at openjdk.org Fri Aug 8 14:51:20 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Fri, 8 Aug 2025 14:51:20 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v8] In-Reply-To: References: Message-ID: On Fri, 11 Jul 2025 12:23:58 GMT, Andrew Haley wrote: >> I see. Thanks for your explanation. >> Current version is okay to me. Perhaps we may want to add more comments here. >> >> Suggestion: >> >> // Note: vsrc and vtmp2 may match when this function is invoked by `reduce_mul_integral_gt128b()` >> // as a tail call and vsrc holds the intermediate results. > >> I see. Thanks for your explanation. Current version is okay to me. Perhaps we may want to add more comments here. > > The current code is just the sort of trap for the maintainer that leads to hard-to-find bugs. It'd be much better to remove the need for this comment by forcing everyone to provide two distinct scratch registers. @theRealAph , fixed, the implementation doesn't try to do anything smart anymore. It ensures that [all registers](https://github.com/openjdk/jdk/pull/23181/files#diff-75bfb44278df267ce4978393b9b6b6030a7e23065ca15436fb1a5009debc6e81R2002) [are different](https://github.com/openjdk/jdk/pull/23181/files#diff-75bfb44278df267ce4978393b9b6b6030a7e23065ca15436fb1a5009debc6e81R2091) for all supported integer types [but `T_LONG`](https://github.com/openjdk/jdk/pull/23181/files#diff-75bfb44278df267ce4978393b9b6b6030a7e23065ca15436fb1a5009debc6e81R2089) which is a special case. We [pass](https://github.com/openjdk/jdk/pull/23181/files#diff-edf6d70f65d81dc12a483088e0610f4e059bd40697f242aedfed5c2da7475f1aR3519) a couple of `fnoreg`s for `T_LONG` as the implementation for this type requires less temporary vregs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2263191527 From mhaessig at openjdk.org Fri Aug 8 14:56:11 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 8 Aug 2025 14:56:11 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask In-Reply-To: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: On Fri, 8 Aug 2025 12:30:36 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. > > In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. The major and obivous mistake I committed myself with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first `delete`-s, then asks for `next` (facepalms). > > Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. > > I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. > > Additional testing: > - [x] Linux AArch64 server fastdebug, reproducer no longer fails > - [x] Linux AArch64 server fastdebug, `compiler` > - [ ] Linux AArch64 server fastdebug, `all` Thank you for fixing this and adding the diagnostic code, @shipilev. The fix looks reasonable to me. I kicked off testing on our side and will keep you posted on the results. ------------- PR Review: https://git.openjdk.org/jdk/pull/26696#pullrequestreview-3101141141 From mablakatov at openjdk.org Fri Aug 8 15:16:16 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Fri, 8 Aug 2025 15:16:16 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v8] In-Reply-To: References: Message-ID: On Fri, 11 Jul 2025 08:40:40 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 4073: >> >>> 4071: f(0b101111, 15, 10), rf(Zn, 5), rf(Zd, 0); >>> 4072: } >>> 4073: >> >> This pattern should be in a section _SVE Integer Reduction_, C4.1.37. I'm not sure if any other instructions in that group are defined yet, but if not please start the section. > > Sorry, the unpredicated version should be in the _SVE Integer Misc - Unpredicated_ section. Are you asking to move it to another existing section in the file or create a new one? If it's the former, could you point me to the section in the file - I can see neither `sve_ftssel` nor `sve_fexpa` defined. If the latter, in Arm ARM *C4.1.41 SVE Integer Misc - Unpredicated* is followed by *C4.1.42 SVE Element Count*, so the patch places `sve_movprfx` definition right before `sve_cnt*`; I also don't see an opportunity to define an `INSN` for this section as encodings of the instructions within the section do not follow a single pattern. If it's something else completely, please elaborate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2263251283 From kvn at openjdk.org Fri Aug 8 15:17:19 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Aug 2025 15:17:19 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask In-Reply-To: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: On Fri, 8 Aug 2025 12:30:36 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. > > In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. The major and obivous mistake I committed myself with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first `delete`-s, then asks for `next` (facepalms). > > Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. > > I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. > > Additional testing: > - [x] Linux AArch64 server fastdebug, reproducer no longer fails > - [x] Linux AArch64 server fastdebug, `compiler` > - [ ] Linux AArch64 server fastdebug, `all` src/hotspot/share/compiler/compileBroker.cpp line 509: > 507: } > 508: task->set_next(nullptr); > 509: task->set_prev(nullptr); Should we do `task->prev()->set_next(task->next())` after some checks? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26696#discussion_r2263253911 From shade at openjdk.org Fri Aug 8 15:32:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 8 Aug 2025 15:32:12 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask In-Reply-To: References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: On Fri, 8 Aug 2025 15:15:01 GMT, Vladimir Kozlov wrote: >> See the bug for more investigation. >> >> In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. The major and obivous mistake I committed myself with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first `delete`-s, then asks for `next` (facepalms). >> >> Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. >> >> I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, reproducer no longer fails >> - [x] Linux AArch64 server fastdebug, `compiler` >> - [ ] Linux AArch64 server fastdebug, `all` > > src/hotspot/share/compiler/compileBroker.cpp line 509: > >> 507: } >> 508: task->set_next(nullptr); >> 509: task->set_prev(nullptr); > > Should we do `task->prev()->set_next(task->next())` after some checks? We do it at L494? Maybe I don't understand the question. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26696#discussion_r2263286402 From kvn at openjdk.org Fri Aug 8 15:46:10 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Aug 2025 15:46:10 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask In-Reply-To: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: On Fri, 8 Aug 2025 12:30:36 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. > > In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. The major and obivous mistake I committed myself with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first `delete`-s, then asks for `next` (facepalms). > > Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. > > I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. > > Additional testing: > - [x] Linux AArch64 server fastdebug, reproducer no longer fails > - [x] Linux AArch64 server fastdebug, `compiler` > - [ ] Linux AArch64 server fastdebug, `all` Good. I will run testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/26696#pullrequestreview-3101289948 From kvn at openjdk.org Fri Aug 8 15:46:11 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Aug 2025 15:46:11 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask In-Reply-To: References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: On Fri, 8 Aug 2025 15:29:54 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/compiler/compileBroker.cpp line 509: >> >>> 507: } >>> 508: task->set_next(nullptr); >>> 509: task->set_prev(nullptr); >> >> Should we do `task->prev()->set_next(task->next())` after some checks? > > We do it at L494? Maybe I don't understand the question. Yes, that is what I asked for. I did not look above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26696#discussion_r2263311329 From fgao at openjdk.org Fri Aug 8 16:10:15 2025 From: fgao at openjdk.org (Fei Gao) Date: Fri, 8 Aug 2025 16:10:15 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v5] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 02:31:08 GMT, Xiaohong Gong wrote: >> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform. >> >> ### Background >> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register. >> >> ### Implementation >> >> #### Challenges >> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints. >> >> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches: >> - SPECIES_64: Single operation with mask (8 elements, 256-bit) >> - SPECIES_128: Single operation, full register (16 elements, 512-bit) >> - SPECIES_256: Two operations + merge (32 elements, 1024-bit) >> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit) >> >> Use `ByteVector.SPECIES_512` as an example: >> - It contains 64 elements. So the index vector size should be `64 * 32` bits, which is 4 times of the SVE vector register size. >> - It requires 4 times of vector gather-loads to finish the whole operation. >> >> >> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...] >> int[] idx = [0, 1, 2, 3, ..., 63, ...] >> >> 4 gather-load: >> idx_v1 = [15 14 13 ... 1 0] gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa] >> idx_v2 = [31 30 29 ... 17 16] gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb] >> idx_v3 = [47 46 45 ... 33 32] gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc] >> idx_v4 = [63 62 61 ... 49 48] gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd] >> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa] >> >> >> #### Solution >> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end. >> >> Here is the main changes: >> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher. >> - Added `VectorSliceNode` for result mer... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge 'jdk:master' into JDK-8351623-sve > - Address review comments > - Refine IR pattern and clean backend rules > - Fix indentation issue and move the helper matcher method to header files > - Merge branch jdk:master into JDK-8351623-sve > - 8351623: VectorAPI: Add SVE implementation of subword gather load operation Thanks for updating it. Looks good on my end. It might be helpful to have Reviewers take a look. ------------- Marked as reviewed by fgao (Committer). PR Review: https://git.openjdk.org/jdk/pull/26236#pullrequestreview-3101387060 From shade at openjdk.org Fri Aug 8 17:07:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 8 Aug 2025 17:07:10 GMT Subject: RFR: 8360304: Redundant condition in LibraryCallKit::inline_vector_nary_operation In-Reply-To: References: Message-ID: On Sat, 2 Aug 2025 15:44:22 GMT, Francesco Andreuzzi wrote: > The check for `sopc != 0` is not needed after JDK-8353786, the function would exit at L374 otherwise. > > Passes tier1. This looks good to me. @iwanowww: from the code history, it looks like you added the check we are now depending on, could you please take a look as well? Thanks! ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26606#pullrequestreview-3101607475 From dlong at openjdk.org Fri Aug 8 18:48:13 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 8 Aug 2025 18:48:13 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v5] In-Reply-To: <6gq4iIBw4RIqqPvmAf2MHnKrmYHwOdWdH1fz1bFaCGA=.57906956-460f-4a1d-9e3e-fbf91a7974e2@github.com> References: <6gq4iIBw4RIqqPvmAf2MHnKrmYHwOdWdH1fz1bFaCGA=.57906956-460f-4a1d-9e3e-fbf91a7974e2@github.com> Message-ID: On Fri, 8 Aug 2025 10:51:42 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. >> >> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. >> >> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. >> >> Testing: >> - [x] Github Actions >> - [x] tier1, tier2 on all platforms >> - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug >> - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8308094-timeout > - Rename _timer > - remove _timeout_armed > - ASSERT > - Merge branch 'master' into JDK-8308094-timeout > - No acquire release semantics > - Factor Linux specific timeout functionality out of share/ > - Move timeout disarm above if > - Merge branch 'master' into JDK-8308094-timeout > - Fix SIGALRM test > - ... and 1 more: https://git.openjdk.org/jdk/compare/2eac5347...8bb5eb7a Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26023#pullrequestreview-3101863176 From kvn at openjdk.org Fri Aug 8 19:43:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Aug 2025 19:43:13 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask In-Reply-To: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: On Fri, 8 Aug 2025 12:30:36 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. > > In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. The major and obivous mistake I committed myself with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first `delete`-s, then asks for `next` (facepalms). > > Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. > > I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. > > Additional testing: > - [x] Linux AArch64 server fastdebug, reproducer no longer fails > - [x] Linux AArch64 server fastdebug, `compiler` > - [x] Linux AArch64 server fastdebug, `all` My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26696#pullrequestreview-3101981178 From thartmann at openjdk.org Fri Aug 8 22:29:13 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 8 Aug 2025 22:29:13 GMT Subject: RFR: 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" [v4] In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 22:58:50 GMT, Guanqiang Han wrote: >> I'm able to consistently reproduce the problem using the following command line and test program ? >> >> java -Xcomp -XX:TieredStopAtLevel=1 -XX:C1MaxInlineSize=200 Test.java >> >> import java.util.Arrays; >> public class Test{ >> public static void main(String[] args) { >> System.out.println("begin"); >> byte[] arr1 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; >> byte[] arr2 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; >> System.out.println(Arrays.equals(arr1, arr2)); >> System.out.println("end"); >> } >> } >> >> From my analysis, the root cause appears to be a mismatch in operand handling between T_ADDRESS and T_LONG in LIR_Assembler::stack2reg, especially when the source is marked as double stack (e.g., T_LONG) and the destination as single CPU register (e.g., T_ADDRESS), leading to assertion failures like assert(is_single_stack())(because T_LONG is double_size). >> >> In the test program above , the call chain is: Arrays.equals ? ArraysSupport.vectorizedMismatch ? LIRGenerator::do_vectorizedMismatch >> Within the do_vectorizedMismatch() method, a move instruction constructs an LIR_Op1. During LIR to machine code generation, LIR_Assembler::stack2reg was called. >> >> In this case, the src operand has type T_LONG and the dst operand has type T_ADDRESS. This combination triggers an assert in stack2reg, due to a mismatch between the stack slot type and register type handling. >> >> Importantly, this path ( LIR_Assembler::stack2reg was called ) is only taken when src is forced onto the stack. To reliably trigger this condition, the test is run with the -Xcomp option to force compilation and increase register pressure. >> >> A reference to the relevant code paths is provided below : >> image1 >> image2 >> >> On 64-bit platforms, although T_ADDRESS is classified as single_size, it is in fact 64 bits wide ,represent a single 64-bit general-purpose register and it can hold a T_LONG value, which is also 64 bits. >> >> However, T_LONG is defined as double_size, requiring two local variable slots or a pair of registers in the JVM's abstract model. This mismatch stems from the fact that T_ADDRESS is platform-dependent: it's 32 bits on 32-bit platforms, and 64 bits on 64-bit platforms ? yet its size class... > > Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - change T_LONG to T_ADDRESS in some intrinsic functions > - Merge remote-tracking branch 'upstream/master' into 8359235 > - Increase sleep time to ensure the method gets compiled > - add regression test > - Merge remote-tracking branch 'upstream/master' into 8359235 > - 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" test/hotspot/jtreg/compiler/intrinsics/TestStack2RegSlotMismatch.java line 29: > 27: * @summary Test C1 stack2reg after fixing incorrect use of T_LONG in intrinsic > 28: * @requires vm.debug == true & vm.compiler1.enabled > 29: * @run main/othervm -XX:TieredStopAtLevel=1 -XX:C1MaxInlineSize=200 -XX:CompileThreshold=10 compiler.intrinsics.TestStack2RegSlotMismatch I'm still wondering if this test can be reduced. Right now `-XX:C1MaxInlineSize=200 -XX:CompileThreshold=10` will lead to a lot of methods being compiled with C1 but I assume there is only one method that actually triggers the issue, right? Can we restrict compilation to that one? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26462#discussion_r2264078800 From aph at openjdk.org Sat Aug 9 07:32:15 2025 From: aph at openjdk.org (Andrew Haley) Date: Sat, 9 Aug 2025 07:32:15 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v8] In-Reply-To: References: Message-ID: On Fri, 8 Aug 2025 15:13:40 GMT, Mikhail Ablakatov wrote: >> Sorry, the unpredicated version should be in the _SVE Integer Misc - Unpredicated_ section. > > Are you asking to move it to another existing section in the file or create a new one? If it's the former, could you point me to the section in the file - I can see neither `sve_ftssel` nor `sve_fexpa` defined. If the latter, in Arm ARM *C4.1.41 SVE Integer Misc - Unpredicated* is followed by *C4.1.42 SVE Element Count*, so the patch places `sve_movprfx` definition right before `sve_cnt*`; I also don't see an opportunity to define an `INSN` for this section as encodings of the instructions within the section do not follow a single pattern. > > If it's something else completely, please elaborate. Please try to organize things the same way as the Decode section of the ARM. Insert a new section called _SVE Integer Misc - Unpredicated_ after _SVE bitwise shift by immediate (predicated)_ and put this pattern there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2264572038 From snatarajan at openjdk.org Sat Aug 9 15:35:00 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Sat, 9 Aug 2025 15:35:00 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v6] In-Reply-To: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: > **Issue** > Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. > > **Analysis** > On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. > > **Proposal** > Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. > > **Issue in AArch64** > Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. > > **Question to reviewers** > Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: Addressing review - testcase and max range is set to 1000 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26139/files - new: https://git.openjdk.org/jdk/pull/26139/files/2fc4b0b7..9dae3aef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26139&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26139&range=04-05 Stats: 42 lines in 2 files changed: 41 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26139/head:pull/26139 PR: https://git.openjdk.org/jdk/pull/26139 From epeter at openjdk.org Sun Aug 10 05:26:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 10 Aug 2025 05:26:22 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: On Tue, 5 Aug 2025 11:39:43 GMT, Galder Zamarre?o wrote: >> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. >> >> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: >> >> >> Benchmark (seed) (size) Mode Cnt Base Patch Units Diff >> VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% >> VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% >> VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% >> VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% >> VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% >> VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% >> >> >> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. >> >> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Check at the very least that auto vectorization is supported src/hotspot/share/opto/superword.cpp line 1635: > 1633: } else if (VectorNode::is_convert_opcode(opc)) { > 1634: retValue = VectorCastNode::implemented(opc, size, velt_basic_type(p0->in(1)), velt_basic_type(p0)); > 1635: } else if (VectorNode::is_reinterpret_opcode(opc)) { How does this affect `Op_ReinterpretHF2S` that is also in `VectorNode::is_reinterpret_opcode`? I'm afraid that we need to test this with hardware or Intel's SDE, to make sure we have it running on a VM that actually supports Float16. Otherwise these instructions may not be used, and hence not tested right. @galderz Can you run the relevant tests? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2265119804 From duke at openjdk.org Sun Aug 10 12:26:33 2025 From: duke at openjdk.org (Tobias Hotz) Date: Sun, 10 Aug 2025 12:26:33 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v2] In-Reply-To: References: Message-ID: > This PR improves the value of interger division nodes. > Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case > We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. > This also cleans up and unifies the code paths for DivINode and DivLNode. > I've added some tests to validate the optimization. Without the changes, some of these tests fail. Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: Fixes after review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26143/files - new: https://git.openjdk.org/jdk/pull/26143/files/dacaddac..8dd1ff1b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=00-01 Stats: 28 lines in 2 files changed: 13 ins; 12 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26143.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26143/head:pull/26143 PR: https://git.openjdk.org/jdk/pull/26143 From duke at openjdk.org Sun Aug 10 12:33:13 2025 From: duke at openjdk.org (Tobias Hotz) Date: Sun, 10 Aug 2025 12:33:13 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v2] In-Reply-To: References: Message-ID: On Sun, 10 Aug 2025 12:26:33 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Fixes after review Thanks for the fast review! The main reason for all the if cases is that min_int / (-1) is undefined behavior in C++, as it overflows. All code has to be careful that this special case can't happen in C++ code, and that's the main motivation behind all the ifs. I've added a comment that describes that. Otherwise, you would be right: Redudant calculations are no problem, min and max would take care of that. Regarding testing: I only ran tier1 tests on my machine and GHA ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3172593360 From duke at openjdk.org Sun Aug 10 12:33:15 2025 From: duke at openjdk.org (Tobias Hotz) Date: Sun, 10 Aug 2025 12:33:15 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v2] In-Reply-To: References: Message-ID: On Thu, 7 Aug 2025 09:26:40 GMT, Manuel H?ssig wrote: >> Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixes after review > > src/hotspot/share/opto/divnode.cpp line 568: > >> 566: if (i2_lo != i2_hi) { >> 567: // special case not possible here, _lo mus >> 568: assert(i2_lo != -1, "Special case not possible here"); > > While functionally correct, here you are only talking about the negative special case, but if `i2_lo in [0,1]` the same might happen on the positive side. > Suggestion: > > // If the divisor range is wider than a singleton, include (i1->_lo, i2->_lo). > // We cannot use is_con here, as a range of [-1, 0] for i2_hi and [0,1] for i2_lo > // will also result in i2_lo and i2_hi being -1, or i2_lo and i2_hi being 1 > // respectively. > if (i2_lo != i2_hi) { > assert(i2_hi - i2_lo >= 1, "i2 must be wider that a singleton"); You are right, because for hi, it doesn't matter and would just be a useless computation, see my comment below. We need to make sure i2_lo is != -1 to avoid running into UB. > src/hotspot/share/opto/divnode.cpp line 575: > >> 573: >> 574: // If i1 is not a single constant, include the two corners with i1->_hi: >> 575: // (i1->_hi, i2->_lo) and (i1->_hi, i2->_hi) > > Why do you not have to handle the case of `i2` being a singleton range? Also the same reason: i2 being a singleton is no problem, we just do some useless calculations, but we can not end up in UB due to i1 being > min_int ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2265263302 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2265263541 From epeter at openjdk.org Mon Aug 11 00:08:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 11 Aug 2025 00:08:11 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. > - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. > - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` > - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` > - For aliasing analysis (adjacency and overlap), the "regu... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/8f1f9329..238342ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Mon Aug 11 00:10:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 11 Aug 2025 00:10:20 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v5] In-Reply-To: <_jNlOz7RH0YN28AR-LhEwqnaPa_Vy-nUd3B_bMTYum8=.9307cd79-0f69-440d-bf0f-3a0fc54a8335@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <_jNlOz7RH0YN28AR-LhEwqnaPa_Vy-nUd3B_bMTYum8=.9307cd79-0f69-440d-bf0f-3a0fc54a8335@github.com> Message-ID: On Mon, 4 Aug 2025 10:46:11 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix test after merge > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 823: > >> 821: applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true", "rvv", "true"}) >> 822: // FAILS: invariants are sorted differently, because of differently inserted Cast. >> 823: // See: JDK-8331659 > > With the integration of #26429, this should pass. Yes, sounds good. Who ever integrates second will have to fix it then ;) > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 866: > >> 864: applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true", "rvv", "true"}) >> 865: // FAILS: invariants are sorted differently, because of differently inserted Cast. >> 866: // See: JDK-8331659 > > With the integration of #26429, this should pass. Yes, sounds good. Who ever integrates second will have to fix it then ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2265499995 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2265500000 From epeter at openjdk.org Mon Aug 11 00:14:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 11 Aug 2025 00:14:15 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v5] In-Reply-To: <_jNlOz7RH0YN28AR-LhEwqnaPa_Vy-nUd3B_bMTYum8=.9307cd79-0f69-440d-bf0f-3a0fc54a8335@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <_jNlOz7RH0YN28AR-LhEwqnaPa_Vy-nUd3B_bMTYum8=.9307cd79-0f69-440d-bf0f-3a0fc54a8335@github.com> Message-ID: On Mon, 4 Aug 2025 10:55:11 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix test after merge > > Thank you for addressing my comments. I only have a few follow-ups. @mhaessig Thanks for the responses. I integrated the one suggestion now, I think it is ready for another round ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3172988859 From ghan at openjdk.org Mon Aug 11 00:49:45 2025 From: ghan at openjdk.org (Guanqiang Han) Date: Mon, 11 Aug 2025 00:49:45 GMT Subject: RFR: 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" [v5] In-Reply-To: References: Message-ID: <70bF6nyeg21mKc4SxXn9QulJPjMikmxUUcG08smx7hk=.1815618d-e5a7-4d50-af63-7a93dfd01fe8@github.com> > I'm able to consistently reproduce the problem using the following command line and test program ? > > java -Xcomp -XX:TieredStopAtLevel=1 -XX:C1MaxInlineSize=200 Test.java > > import java.util.Arrays; > public class Test{ > public static void main(String[] args) { > System.out.println("begin"); > byte[] arr1 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; > byte[] arr2 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; > System.out.println(Arrays.equals(arr1, arr2)); > System.out.println("end"); > } > } > > From my analysis, the root cause appears to be a mismatch in operand handling between T_ADDRESS and T_LONG in LIR_Assembler::stack2reg, especially when the source is marked as double stack (e.g., T_LONG) and the destination as single CPU register (e.g., T_ADDRESS), leading to assertion failures like assert(is_single_stack())(because T_LONG is double_size). > > In the test program above , the call chain is: Arrays.equals ? ArraysSupport.vectorizedMismatch ? LIRGenerator::do_vectorizedMismatch > Within the do_vectorizedMismatch() method, a move instruction constructs an LIR_Op1. During LIR to machine code generation, LIR_Assembler::stack2reg was called. > > In this case, the src operand has type T_LONG and the dst operand has type T_ADDRESS. This combination triggers an assert in stack2reg, due to a mismatch between the stack slot type and register type handling. > > Importantly, this path ( LIR_Assembler::stack2reg was called ) is only taken when src is forced onto the stack. To reliably trigger this condition, the test is run with the -Xcomp option to force compilation and increase register pressure. > > A reference to the relevant code paths is provided below : > image1 > image2 > > On 64-bit platforms, although T_ADDRESS is classified as single_size, it is in fact 64 bits wide ,represent a single 64-bit general-purpose register and it can hold a T_LONG value, which is also 64 bits. > > However, T_LONG is defined as double_size, requiring two local variable slots or a pair of registers in the JVM's abstract model. This mismatch stems from the fact that T_ADDRESS is platform-dependent: it's 32 bits on 32-bit platforms, and 64 bits on 64-bit platforms ? yet its size classification remains single_size regardless. > > This classification... Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - restrict compilation to the single method - Merge remote-tracking branch 'upstream/master' into 8359235 - change T_LONG to T_ADDRESS in some intrinsic functions - Merge remote-tracking branch 'upstream/master' into 8359235 - Increase sleep time to ensure the method gets compiled - add regression test - Merge remote-tracking branch 'upstream/master' into 8359235 - 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26462/files - new: https://git.openjdk.org/jdk/pull/26462/files/c90be2b5..4e084ec4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26462&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26462&range=03-04 Stats: 16989 lines in 462 files changed: 11302 ins; 4142 del; 1545 mod Patch: https://git.openjdk.org/jdk/pull/26462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26462/head:pull/26462 PR: https://git.openjdk.org/jdk/pull/26462 From ghan at openjdk.org Mon Aug 11 00:49:46 2025 From: ghan at openjdk.org (Guanqiang Han) Date: Mon, 11 Aug 2025 00:49:46 GMT Subject: RFR: 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" [v4] In-Reply-To: References: Message-ID: On Fri, 8 Aug 2025 22:26:54 GMT, Tobias Hartmann wrote: >> Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - change T_LONG to T_ADDRESS in some intrinsic functions >> - Merge remote-tracking branch 'upstream/master' into 8359235 >> - Increase sleep time to ensure the method gets compiled >> - add regression test >> - Merge remote-tracking branch 'upstream/master' into 8359235 >> - 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" > > test/hotspot/jtreg/compiler/intrinsics/TestStack2RegSlotMismatch.java line 29: > >> 27: * @summary Test C1 stack2reg after fixing incorrect use of T_LONG in intrinsic >> 28: * @requires vm.debug == true & vm.compiler1.enabled >> 29: * @run main/othervm -XX:TieredStopAtLevel=1 -XX:C1MaxInlineSize=200 -XX:CompileThreshold=10 compiler.intrinsics.TestStack2RegSlotMismatch > > I'm still wondering if this test can be reduced. Right now `-XX:C1MaxInlineSize=200 -XX:CompileThreshold=10` will lead to a lot of methods being compiled with C1 but I assume there is only one method that actually triggers the issue, right? Can we restrict compilation to that one? Hi @TobiHartmann, thanks for the suggestion! I've updated the test to restrict compilation to the single method by using -XX:CompileCommand=compileonly,.... Please take another look when you have a time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26462#discussion_r2265514745 From epeter at openjdk.org Mon Aug 11 01:14:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 11 Aug 2025 01:14:21 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression In-Reply-To: References: Message-ID: On Tue, 22 Jul 2025 15:05:29 GMT, Manuel H?ssig wrote: > A loop of the form > > MemorySegment ms = {}; > for (long i = 0; i < ms.byteSize() / 8L; i++) { > // vectorizable work > } > > does not vectorize, whereas > > MemorySegment ms = {}; > long size = ms.byteSize(); > for (long i = 0; i < size / 8L; i++) { > // vectorizable work > } > > vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: > > https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 > > Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. > > So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. > > @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem > > ## Change Description > > Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. > >
Explored Alternatives > 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. > 2. Insert a "`PHASEIDEALLOOP0`" with `LoopOptsNone` that only perfor... Thanks for working on this! I think the general approach is good, I just have some questions about details ;) src/hotspot/share/opto/loopnode.hpp line 1635: > 1633: private: > 1634: // Class to keep track of wins in split_through_phi. > 1635: class SplitWins { Isn't it called `split_thru_phi`? Why not call it `SplitThruPhiWins`? src/hotspot/share/opto/loopnode.hpp line 1645: > 1643: _total_wins(0), > 1644: _loop_entry_wins(0), > 1645: _loop_back_wins(0) {}; Can you describe somewhere what the definition of these is? I'm struggling a little with understanding the conditions in `profitable`. src/hotspot/share/opto/loopopts.cpp line 239: > 237: } else { > 238: tty->print("Region "); > 239: } What if it is another kind of loop? Could it be a `LongCountedLoop` or something else we don't have yet? I suggest you just use `region->Name()` and format that string into your output. test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateAddSub.java line 351: > 349: @IR(counts = {IRNode.SUB_I, "1"}) > 350: public int addSubInt(int inv1, int inv2, int size) { > 351: int result = -1; Can you document where the adds are? Do we manage to re-assoriate `inv1 + (inv2 - i)` to `(inv1 + inv2) - i` so that the addition can float out of the loop? test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentByteSizeLongLoopLimit.java line 38: > 36: * @library /test/lib / > 37: * @run driver compiler.loopopts.superword.TestMemorySegmentByteSizeLongLoopLimit > 38: */ For MemorySegment tests, I've made the experience that it is quite important to test out some runs with additional flag combinations: at least `AlignVector` and `ShortRunningLongLoop`. Same might apply for the tests below. ------------- PR Review: https://git.openjdk.org/jdk/pull/26429#pullrequestreview-3103806552 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2265504241 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2265525583 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2265506324 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2265507445 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2265512304 From xgong at openjdk.org Mon Aug 11 01:41:23 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 11 Aug 2025 01:41:23 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v5] In-Reply-To: References: Message-ID: On Fri, 8 Aug 2025 16:07:42 GMT, Fei Gao wrote: > Thanks for updating it. Looks good on my end. It might be helpful to have Reviewers take a look. Thanks a lot for your review and test! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3173065089 From thartmann at openjdk.org Mon Aug 11 01:42:15 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 11 Aug 2025 01:42:15 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v6] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: <1QbX5WHkEdjP-unAFJ1vYaoIc9bV8zz8dA-vKZCkYn8=.8e3704ae-9490-4471-9e5c-dae44004d46f@github.com> On Sat, 9 Aug 2025 15:35:00 GMT, Saranya Natarajan wrote: >> **Issue** >> Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. >> >> **Analysis** >> On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. >> >> **Proposal** >> Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. >> >> **Issue in AArch64** >> Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? >> >> **Request to port maintainers** >> @dafedafe suggested that we keep the upper boun... > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > Addressing review - testcase and max range is set to 1000 test/hotspot/jtreg/compiler/arguments/TestBciProfileWidth.java line 28: > 26: * @summary Test the range defined in globals.hpp for BciProfileWidth > 27: * @bug 8358696 > 28: * @run main/othervm -XX:BciProfileWidth=0 `BciProfileWidth` is debug only, right? test/lib-test/jdk/test/whitebox/vm_flags/IntxTest.java line 41: > 39: private static final long COMPILE_THRESHOLD = VmFlagTest.WHITE_BOX.getIntxVMFlag("CompileThreshold"); > 40: private static final Long[] TESTS = {0L, 100L, (long)(Integer.MAX_VALUE>>3)*100L}; > 41: private static final String FLAG_DEBUG_NAME = "BinarySwitchThreshold"; Why did you move the location of the declaration? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26139#discussion_r2265539177 PR Review Comment: https://git.openjdk.org/jdk/pull/26139#discussion_r2265539538 From xgong at openjdk.org Mon Aug 11 01:48:13 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 11 Aug 2025 01:48:13 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v5] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 02:31:08 GMT, Xiaohong Gong wrote: >> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform. >> >> ### Background >> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register. >> >> ### Implementation >> >> #### Challenges >> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints. >> >> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches: >> - SPECIES_64: Single operation with mask (8 elements, 256-bit) >> - SPECIES_128: Single operation, full register (16 elements, 512-bit) >> - SPECIES_256: Two operations + merge (32 elements, 1024-bit) >> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit) >> >> Use `ByteVector.SPECIES_512` as an example: >> - It contains 64 elements. So the index vector size should be `64 * 32` bits, which is 4 times of the SVE vector register size. >> - It requires 4 times of vector gather-loads to finish the whole operation. >> >> >> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...] >> int[] idx = [0, 1, 2, 3, ..., 63, ...] >> >> 4 gather-load: >> idx_v1 = [15 14 13 ... 1 0] gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa] >> idx_v2 = [31 30 29 ... 17 16] gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb] >> idx_v3 = [47 46 45 ... 33 32] gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc] >> idx_v4 = [63 62 61 ... 49 48] gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd] >> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa] >> >> >> #### Solution >> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end. >> >> Here is the main changes: >> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher. >> - Added `VectorSliceNode` for result mer... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge 'jdk:master' into JDK-8351623-sve > - Address review comments > - Refine IR pattern and clean backend rules > - Fix indentation issue and move the helper matcher method to header files > - Merge branch jdk:master into JDK-8351623-sve > - 8351623: VectorAPI: Add SVE implementation of subword gather load operation Hi, could anyone please help take a look at this PR? Thanks so much! Hi @RealFYang , not sure whether there is any plan to support the subword gather-load for RVV, it will be much appreciated if we can get any feedback from other architecture side. Would you mind taking a look at this PR? Thanks a lot in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3173071810 From dzhang at openjdk.org Mon Aug 11 02:26:51 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 11 Aug 2025 02:26:51 GMT Subject: RFR: 8365200: RISC-V: compiler/loopopts/superword/TestGeneralizedReductions.java fails with Zvbb and vlen=128 Message-ID: Hi all, Please take a look and review this PR, thanks! [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass with zvbb when vlen=256, but fail when vlen=128. The reason for the error is the same as [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694). 2-element reductions for INT/LONG are not profitable, so the compiler won't generate the corresponding reductions IR. This issue was not addressed together with [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694) because the testMapReductionOnGlobalAccumulator use case where the error is reported has a different applyif method from other use cases: zvbb needs to be enabled. ### Test (fastdebug) - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=256 - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=128 ------------- Commit messages: - 8365200: RISC-V: compiler/loopopts/superword/TestGeneralizedReductions.java fails with Zvbb and vlen=128 Changes: https://git.openjdk.org/jdk/pull/26719/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26719&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365200 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26719/head:pull/26719 PR: https://git.openjdk.org/jdk/pull/26719 From fyang at openjdk.org Mon Aug 11 02:33:10 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 11 Aug 2025 02:33:10 GMT Subject: RFR: 8365200: RISC-V: compiler/loopopts/superword/TestGeneralizedReductions.java fails with Zvbb and vlen=128 In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 02:06:02 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass with zvbb when vlen=256, but fail when vlen=128. > > The reason for the error is the same as [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694). 2-element reductions for INT/LONG are not profitable, so the compiler won't generate the corresponding reductions IR. > > This issue was not addressed together with [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694) because the testMapReductionOnGlobalAccumulator use case where the error is reported has a different applyif method from other use cases: zvbb needs to be enabled. > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=256 > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=128 Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26719#pullrequestreview-3103874836 From xgong at openjdk.org Mon Aug 11 03:10:17 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 11 Aug 2025 03:10:17 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v2] In-Reply-To: References: Message-ID: <1Vs8Ud-yh7FtFJN9sddNXDVM6Mc0ue9oi_oa0w5pRzU=.022172f3-1622-4d05-888b-c7afc66a5254@github.com> On Fri, 25 Jul 2025 20:09:40 GMT, Jatin Bhateja wrote: >> Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. >> It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. >> >> Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). >> >> Vector API jtreg tests pass at AVX level 2, remaining validation in progress. >> >> Performance numbers: >> >> >> System : 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms >> VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms >> VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms >> VectorSliceB... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating predicate checks Thanks for your work @jatin-bhateja! This PR also provides help on AArch64 that we also have plan to do the same intrinsifaction in our side. src/hotspot/share/opto/vectorIntrinsics.cpp line 1667: > 1665: bool LibraryCallKit::inline_vector_slice() { > 1666: const TypeInt* origin = gvn().type(argument(0))->isa_int(); > 1667: const TypeInstPtr* vector_klass = gvn().type(argument(1))->isa_instptr(); Code style: Suggestion: const TypeInt* origin = gvn().type(argument(0))->isa_int(); const TypeInstPtr* vector_klass = gvn().type(argument(1))->isa_instptr(); src/hotspot/share/opto/vectorIntrinsics.cpp line 1700: > 1698: > 1699: if (!arch_supports_vector(Op_VectorSlice, num_elem, elem_bt, VecMaskNotUsed)) { > 1700: log_if_needed(" ** not supported: arity=2 op=slice vlen=%d etype=%s ismask=useload/none", `ismask=useload/none` is not necessary here? src/hotspot/share/opto/vectorIntrinsics.cpp line 1714: > 1712: } > 1713: > 1714: Node* origin_node = gvn().intcon(origin->get_con() * type2aelembytes(elem_bt)); Q1: Is it possible that just passing `origin->get_con()` to `VectorSliceNode` in case there are architectures that need it directly? Or, maybe we'd better add comment telling that the origin passed to `VectorSliceNode` is adjust to bytes. Q2: If `origin` is not a constant, and there is an architecture that support the index as a variable, will the code crash here? Can we just limit the `origin` to a constant for this intrinsifaction in this PR? We can consider to extend it to variable in case any architecture has such requirement. WDYT? src/hotspot/share/opto/vectornode.hpp line 1719: > 1717: class VectorSliceNode : public VectorNode { > 1718: public: > 1719: VectorSliceNode(Node* vec1, Node* vec2, Node* origin, const TypeVect* vt) Do we have specific value for `origin` like zero or vlen? If so, maybe simply Identity is better to be added as well. ------------- PR Review: https://git.openjdk.org/jdk/pull/24104#pullrequestreview-3103877319 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2265568519 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2265573060 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2265579342 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2265580768 From fjiang at openjdk.org Mon Aug 11 03:11:16 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 11 Aug 2025 03:11:16 GMT Subject: RFR: 8365200: RISC-V: compiler/loopopts/superword/TestGeneralizedReductions.java fails with Zvbb and vlen=128 In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 02:06:02 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass with zvbb when vlen=256, but fail when vlen=128. > > The reason for the error is the same as [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694). 2-element reductions for INT/LONG are not profitable, so the compiler won't generate the corresponding reductions IR. > > This issue was not addressed together with [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694) because the testMapReductionOnGlobalAccumulator case where the error is reported has a different applyif method from other cases: zvbb needs to be enabled. > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=256 > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=128 Thanks for catching this! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/26719#pullrequestreview-3103902755 From xgong at openjdk.org Mon Aug 11 03:14:12 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 11 Aug 2025 03:14:12 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v2] In-Reply-To: References: Message-ID: On Fri, 25 Jul 2025 20:09:40 GMT, Jatin Bhateja wrote: >> Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. >> It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. >> >> Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). >> >> Vector API jtreg tests pass at AVX level 2, remaining validation in progress. >> >> Performance numbers: >> >> >> System : 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms >> VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms >> VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms >> VectorSliceB... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating predicate checks test/micro/org/openjdk/bench/jdk/incubator/vector/VectorSliceBenchmark.java line 36: > 34: @State(Scope.Thread) > 35: @Fork(jvmArgs = {"--add-modules=jdk.incubator.vector"}) > 36: public class VectorSliceBenchmark { I remember that it has the micro benchmarks for slice/unslice under `test/micro/org/openjdk/bench/jdk/incubator/vector/operation` on panama-vector. Can we reuse those JMHs to check the benchmark improvement? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2265592754 From aboldtch at openjdk.org Mon Aug 11 05:08:11 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 11 Aug 2025 05:08:11 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v4] In-Reply-To: References: Message-ID: <8TcJ6y_O08pg5m9k3mnOUr0OnQPdPq6LMKOh8oIn1KM=.40cb7a32-8691-4069-bb48-13a767cad50e@github.com> On Thu, 7 Aug 2025 09:23:37 GMT, Fredrik Bredberg wrote: >> Since the integration of [JDK-8359437](https://bugs.openjdk.org/browse/JDK-8359437) the `LockingMode` flag can no longer be set by the user, instead it's declared as `const int LockingMode = LM_LIGHTWEIGHT;`. This means that we can now safely remove all `LockingMode` related code from all platforms. >> >> This PR removes `LockingMode` related code from the **x86** platform. >> >> When all the `LockingMode` code has been removed from all platforms, we can go on and remove it from shared (non-platform specific) files as well. And finally remove the `LockingMode` variable itself. >> >> Passes tier1-tier5 with no added problems. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update three after review Looks good. As long as we take a final pass after all code changes has been performed and clean up the comments, variable-, parameter- and function-names. Would be nice to end with a consistent nomenclature, and remove all outdated terms, at least w.r.t. legacy locking. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26552#pullrequestreview-3104058030 From shade at openjdk.org Mon Aug 11 06:03:11 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 11 Aug 2025 06:03:11 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask In-Reply-To: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: <7oGsbp7UhHMKskMrPFyrMoMNMYNzk4uDoMXMIC_t-0E=.1b9168f9-1945-4670-bd11-bda9e1c5f300@github.com> On Fri, 8 Aug 2025 12:30:36 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. > > In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. The major and obivous mistake I committed myself with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first `delete`-s, then asks for `next` (facepalms). > > Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. > > I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. > > Additional testing: > - [x] Linux AArch64 server fastdebug, reproducer no longer fails > - [x] Linux AArch64 server fastdebug, `compiler` > - [x] Linux AArch64 server fastdebug, `all` Thanks! I think I need a second Review before I can integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26696#issuecomment-3173361051 From dskantz at openjdk.org Mon Aug 11 06:07:14 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Mon, 11 Aug 2025 06:07:14 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors In-Reply-To: References: Message-ID: On Fri, 8 Aug 2025 06:10:56 GMT, Daniel Skantz wrote: > This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). > > In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. > > The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. > > [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 > > [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 > > Testing: T1-4. > > Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. src/hotspot/share/opto/stringopts.cpp line 688: > 686: // which is a problem in the case of repeated stacked concats. > 687: // Put a limit at 100 arguments to guard against excessive resource use. > 688: bool n_args_is_bounded = merged->num_arguments() < 100; This check can be done in the `merge` method instead. It's also possible to increase the bound but it will need a live node check later in `replace_string_concat`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2265763910 From fyang at openjdk.org Mon Aug 11 06:08:16 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 11 Aug 2025 06:08:16 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v22] In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 17:04:33 GMT, Yuri Gaevsky wrote: > Based on above experiments it looks reasonable to use `m2` grouping. Thanks for the extra JMH numbers. Yes, I agree that `m2` is more reasonable here. That means we won't need to reserve so many vector registers for `instruct varrays_hashcode` in src/hotspot/cpu/riscv/riscv_v.ad. So can you free the unused vector registers? Will take a more closer look after that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3173369066 From jbhateja at openjdk.org Mon Aug 11 06:39:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 11 Aug 2025 06:39:57 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v3] In-Reply-To: References: Message-ID: > Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. > It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. > > Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). > > Vector API jtreg tests pass at AVX level 2, remaining validation in progress. > > Performance numbers: > > > System : 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms > VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms > VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms > VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms > VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms > VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 ... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8303762 - Updating predicate checks - Fixes for failing regressions - Optimizing AVX2 backend and some re-factoring - new benchmark - Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8303762 - 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction ------------- Changes: https://git.openjdk.org/jdk/pull/24104/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=02 Stats: 747 lines in 32 files changed: 664 ins; 0 del; 83 mod Patch: https://git.openjdk.org/jdk/pull/24104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104 PR: https://git.openjdk.org/jdk/pull/24104 From hgreule at openjdk.org Mon Aug 11 07:17:40 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 11 Aug 2025 07:17:40 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v6] In-Reply-To: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: > This change improves the precision of the `Mod(I|L)Node::Value()` functions. > > I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early. > The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions. > > ### Monotonicity > > Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). > > ### Testing > > I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something). > > Please review and let me know what you think. > > ### Other > > The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508. > > During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into: > - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? > - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd. Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - typos - Merge branch 'master' into improve-mod-value - Merge branch 'master' into improve-mod-value - simplify UB/cpu exception check - wording - Address more comments - Merge branch 'master' into improve-mod-value - Add randomized test - Use BasicType for shared implementation - Update ModL comment - ... and 8 more: https://git.openjdk.org/jdk/compare/af868121...11210414 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25254/files - new: https://git.openjdk.org/jdk/pull/25254/files/77134c1a..11210414 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=04-05 Stats: 95425 lines in 2499 files changed: 53492 ins; 27474 del; 14459 mod Patch: https://git.openjdk.org/jdk/pull/25254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25254/head:pull/25254 PR: https://git.openjdk.org/jdk/pull/25254 From hgreule at openjdk.org Mon Aug 11 07:20:11 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 11 Aug 2025 07:20:11 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v2] In-Reply-To: <3BJWLK3FukQCp2FHGcyBDTZtbc5aS8VreNKYKAaQrdU=.43a7e821-8d56-4161-850a-9137d17d44de@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> <3BJWLK3FukQCp2FHGcyBDTZtbc5aS8VreNKYKAaQrdU=.43a7e821-8d56-4161-850a-9137d17d44de@github.com> Message-ID: On Mon, 16 Jun 2025 06:57:16 GMT, Emanuel Peter wrote: >> @SirYwell Thanks for looking into this, that looks promising! >> >> I have two bigger comments: >> - Could we unify the L and I code, either using C++ templating or `BasicType`? It would reduce code duplication. >> - Can we have some tests where the input ranges are random as well, and where we check the output ranges with some comparisons? >> >> ------------------ >> Copied from the code comment: >> >>> Nice work with the examples you already have, and randomizing some of it! >>> >>> I would like to see one more generalized test. >>> - compute `res = lhs % rhs` >>> - Truncate both `lhs` and `rhs` with randomly produced bounds from Generators, like this: `lhs = Math.max(lo, Math.min(hi, lhs))`. >>> - Below, add all sorts of comparisons with random constants, like this: `if (res < CON) { sum += 1; }`. If the output range is wrong, this could wrongly constant fold, and allow us to catch that. >>> >>> Then fuzz the generated method a few times with random inputs for `lhs` and `rhs`, and check that the `sum` and `res` value are the same for compiled and interpreted code. >>> >>> I hope that makes sense :) >>> This is currently my best method to check if ranges are correct, and I think it is quite important because often tests are only written with constants in mind, but less so with ranges, and then we mess up the ranges because it is just too tricky. >>> >>> This is an example, where I asked someone to try this out as well: >>> https://github.com/openjdk/jdk/pull/23089/files#diff-12bebea175a260a6ab62c22a3681ccae0c3d9027900d2fdbd8c5e856ae7d1123R404-R422 > >> @eme64 I merged master and hopefully addressed your latest comments. Now that we have #17508 integrated, I could also directly update the unsigned variant, but I'm also fine with doing that separately. WDYT? >> >> I also checked the constant folding part again (or generally whenever the RHS is a constant), these code paths are indeed not used by PhaseGVN directly (but by PhaseCCP and PhaseIdealLoop). That makes it a bit difficult to test that part properly. > > Let's keep the patch as it is. With #17508 we will have to also probably refactor and add more tests, if we want to do any unsigned and known-bit optimizations. > > ---------------- > > @SirYwell Thanks for the updates, I had a few more comments, but we are getting there :) @eme64 I addressed your latest comments now, please re-review :) Regarding my previous observation > * If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? should I open a new RFE for that? Or generally, what's your opinion on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-3173531132 From duke at openjdk.org Mon Aug 11 07:48:15 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 11 Aug 2025 07:48:15 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v22] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 06:05:15 GMT, Fei Yang wrote: > > Based on above experiments it looks reasonable to use `m2` grouping. > > Thanks for the extra JMH numbers. Yes, I agree that `m2` is more reasonable here. That means we won't need to reserve so many vector registers for `instruct varrays_hashcode` in src/hotspot/cpu/riscv/riscv_v.ad. So can you free the unused vector registers? Will take a more closer look after that. Heh, I completely missed that, thanks a lot for catching this! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3173611865 From bkilambi at openjdk.org Mon Aug 11 07:54:53 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 11 Aug 2025 07:54:53 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: > After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - > `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - > > > public void vectorAddConstInputFloat16() { > for (int i = 0; i < LEN; ++i) { > output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); > } > } > > > > > > The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. > > This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). > > Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Addressed review comments and modified some comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26589/files - new: https://git.openjdk.org/jdk/pull/26589/files/a44eccc0..bcecc6e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=02-03 Stats: 39 lines in 2 files changed: 3 ins; 4 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/26589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26589/head:pull/26589 PR: https://git.openjdk.org/jdk/pull/26589 From bkilambi at openjdk.org Mon Aug 11 07:54:54 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 11 Aug 2025 07:54:54 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v3] In-Reply-To: References: Message-ID: On Thu, 7 Aug 2025 15:24:23 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 439: >> >>> 437: bool Assembler::operand_valid_for_sve_dup_immediate(int64_t imm) { >>> 438: return ((imm >= -128 && imm <= 127) || >>> 439: (((imm & 0xff) == 0) && imm >= -32768 && imm <= 32767)); >> >> Hold up! The current predicate was: >> >> >> predicate((n->get_long() <= 127 && n->get_long() >= -128) || >> (n->get_long() <= 32512 && n->get_long() >= -32768 && (n->get_long() & 0xff) == 0)); >> >> >> So the upper bound is _not_ `32767`, but `32512`. Maybe that actually matches the `0xff` mask, I have not checked. But SVE spec talks about `+32512`, so it looks more straightforward just to match that. > > Sure I can do that. Yes, the SVE spec talks specifically about `+32512` but I used `32767` as the largest value divisible by 256 would be `32512` anyway (and `-32768` and `32767` looked a bit more logical for a 16-bit immediate). I don't have much of a preference on this though and will go by your suggestion. Thanks! Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2265940025 From bkilambi at openjdk.org Mon Aug 11 07:54:55 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 11 Aug 2025 07:54:55 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v3] In-Reply-To: References: Message-ID: On Thu, 7 Aug 2025 13:54:39 GMT, Aleksey Shipilev wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 52: > >> 50: >> 51: // Choose FP16_CONST1 which is within the range of [-128 << 8, 127 << 8] and a multiple of 256 >> 52: private static final Float16 FP16_CONST1 = Float16.shortBitsToFloat16((short)512); > > Call them `FP16_IN_RANGE` and `FP16_OUT_OF_RANGE`, maybe? Also rename the test cases from `*1`/`*2` to `*InRange`/`*OutOfRange`? Done > test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 68: > >> 66: >> 67: Generator gen = G.float16s(); >> 68: IntStream.range(0, LEN).forEach(i -> {input[i] = gen.next();}); > > Just do a for loop? Done > test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 79: > >> 77: @IR(counts = {IRNode.REPLICATE_HF_IMM8, ">0"}, >> 78: phase = CompilePhase.FINAL_CODE, >> 79: applyIf = {"MaxVectorSize", ">=32"}, > > `> 16` then? This matches the comment better. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2265940398 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2265940614 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2265940903 From mhaessig at openjdk.org Mon Aug 11 08:00:16 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 11 Aug 2025 08:00:16 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v2] In-Reply-To: References: Message-ID: On Sun, 10 Aug 2025 12:27:55 GMT, Tobias Hotz wrote: > The main reason for all the if cases is that min_int / (-1) is undefined behavior in C++, as it overflows. All code has to be careful that this special case can't happen in C++ code, and that's the main motivation behind all the ifs. I've added a comment that describes that. Otherwise, you would be right: Redudant calculations are no problem, min and max would take care of that. Then I would suggest restructuring the code to express that intent. Reading it currently, gives me the impression that you care about handling the four corners carefully when you really only care about the UB case. The following pseudocode uses ifs to only avoid the corner case. It's only slightly different from your version, but I find it a bit clearer to guess its intent. What do you think? if i1.lo == MIN_INT && (i2.lo == -1 || i2.hi == -1) { new_lo = MIN_INT if i1.hi == MIN_INT { // is_con() new_hi = MAX_INT // (MIN_INT + 1) / -1 return (new_lo, new_hi) // This is already the entire domain, so we can return early } if i2.lo != i2.hi { corner i1.lo, (i2.lo == -1 ? i2.hi : i2.lo) // corner is just shorthand for setting new_lo and new_hi } } else { // i1.lo > MIN_INT corner i1.lo, i2.lo corner i1.lo, i2.hi } // i1.hi > MIN_INT because of early return corner i1.hi, i2.lo corner i1.hi, i2.hi ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3173647591 From rcastanedalo at openjdk.org Mon Aug 11 08:27:13 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 11 Aug 2025 08:27:13 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors In-Reply-To: References: Message-ID: <2lRXDXqlKCIg4lALbH4tSCp5NqwV2E9ZA6vSjqR4ATw=.2e7027a5-8d73-46aa-9fe3-217497e8e3b4@github.com> On Fri, 8 Aug 2025 06:10:56 GMT, Daniel Skantz wrote: > This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). > > In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. > > The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. > > [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 > > [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 > > Testing: T1-4. > > Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. Good catch, Daniel! I have a few comments, mostly about the test. src/hotspot/share/opto/stringopts.cpp line 687: > 685: // sc->num_arguments() * other->num_arguments(), > 686: // which is a problem in the case of repeated stacked concats. > 687: // Put a limit at 100 arguments to guard against excessive resource use. Could you extract this limit into a constant and give it a meaningful name? test/hotspot/jtreg/ProblemList.txt line 1: > 1: # Did you consider running the test with `-XX:-OptoScheduling`, as an alternative to problem listing? This should simply avoid the problematic assertion and would be more robust w.r.t. other platforms. test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsMany.java line 42: > 40: for (int i = 0; i < 10; i++) { > 41: String s = f(" "); > 42: } This could be simplified into: Suggestion: new StringBuilder(); // Trigger loading of the StringBuilder class. String s = f(" "); Then you can narrow down the CompileOnly command to `-XX:CompileOnly=compiler.stringopts.TestStackedConcatsMany::f`. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26685#pullrequestreview-3104581918 PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2265964055 PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2266001174 PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2265968794 From rcastanedalo at openjdk.org Mon Aug 11 08:27:14 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 11 Aug 2025 08:27:14 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors In-Reply-To: <2lRXDXqlKCIg4lALbH4tSCp5NqwV2E9ZA6vSjqR4ATw=.2e7027a5-8d73-46aa-9fe3-217497e8e3b4@github.com> References: <2lRXDXqlKCIg4lALbH4tSCp5NqwV2E9ZA6vSjqR4ATw=.2e7027a5-8d73-46aa-9fe3-217497e8e3b4@github.com> Message-ID: <0cTEMkUut46Bj5V_5N4c6RZTTL8Eq55FwUAwq76puro=.93a7bf09-92f6-4a73-b91d-0b821e5428e7@github.com> On Mon, 11 Aug 2025 08:05:58 GMT, Roberto Casta?eda Lozano wrote: >> This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). >> >> In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. >> >> The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. >> >> [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 >> >> [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 >> >> Testing: T1-4. >> >> Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. > > test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsMany.java line 42: > >> 40: for (int i = 0; i < 10; i++) { >> 41: String s = f(" "); >> 42: } > > This could be simplified into: > Suggestion: > > new StringBuilder(); // Trigger loading of the StringBuilder class. > String s = f(" "); > > Then you can narrow down the CompileOnly command to `-XX:CompileOnly=compiler.stringopts.TestStackedConcatsMany::f`. It would be good to assert that `f(" ")` returns the expected result. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2265974097 From chagedorn at openjdk.org Mon Aug 11 08:51:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Aug 2025 08:51:13 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 23:33:23 GMT, Francisco Ferrari Bihurriet wrote: > Hi, this pull request is a second take of 1383fec41756322bf2832c55633e46395b937b40, by updating the `CmpUNode` type as either `TypeInt::CC_LE` (case 1a) or `TypeInt::CC_LT` (case 1b) instead of updating the `BoolNode` type as `TypeInt::ONE`. > > With this approach a56cd371a2c497e4323756f8b8a08a0bba059bf2 becomes unnecessary. Additionally, having the right type in `CmpUNode` could potentially enable further optimizations. > > #### Testing > > In order to evaluate the changes, the following testing has been performed: > > * `jdk:tier1` (see [GitHub Actions run](https://github.com/franferrax/jdk/actions/runs/16789994433)) > * [`TestBoolNodeGVN.java`](https://github.com/openjdk/jdk/blob/jdk-26+9/test/hotspot/jtreg/compiler/c2/gvn/TestBoolNodeGVN.java), created for [JDK-8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value](https://bugs.openjdk.org/browse/JDK-8327381) (1383fec41756322bf2832c55633e46395b937b40) > * I also checked it breaks if I remove the `CmpUNode::Value_cmpu_and_mask` call > * Private reproducer for [JDK-8349584: Improve compiler processing](https://bugs.openjdk.org/browse/JDK-8349584) (a56cd371a2c497e4323756f8b8a08a0bba059bf2) > * A local slowdebug run of the `test/hotspot/jtreg/compiler/c2` category on _Fedora Linux x86_64_ > * Same results as with `master` (f95af744b07a9ec87e2507b3d584cbcddc827bbd) Thanks for improving this! I have some small suggestions, otherwise, it looks good to me! > Additionally, having the right type in CmpUNode could potentially enable further optimizations. Could you already find some examples, where this change gives us an improved IR? If so, you could also add it as IR test. I'll also give this a spinning in our testing. src/hotspot/share/opto/phaseX.cpp line 2941: > 2939: // Bool > 2940: // > 2941: void PhaseCCP::push_bool_with_cmpu_and_mask(Unique_Node_List& worklist, const Node* use) const { Needed to double-check but I think it's fine to remove the notification code since we already have `push_cmpu()` in place which looks through the `AddI`: https://github.com/openjdk/jdk/blob/10762d408bba9ce0945100847a8674e7eb7fa75e/src/hotspot/share/opto/phaseX.cpp#L2911-L2926 So, whenever `m` or `1` changes, we will re-add the `CmpU` to the CCP worklist with `push_cmpu()`. The `x` does not matter for the application of `Value_cmpu_and_mask()`. src/hotspot/share/opto/subnode.cpp line 855: > 853: // (1a) and (1b) is covered by this method since we can directly return the corresponding TypeInt::CC_* > 854: // while (2) is covered in BoolNode::Ideal since we create a new non-constant node (see [CMPU_MASK]). > 855: const Type* CmpUNode::Value_cmpu_and_mask(PhaseValues* phase, const Node* in1, const Node* in2) { I suggest to directly name these: `in1` -> `andI` `in2`- > `rhs` Then it's easier to follow the comments. src/hotspot/share/opto/subnode.cpp line 1899: > 1897: // based on local information. If the input is constant, do it. > 1898: const Type* BoolNode::Value(PhaseGVN* phase) const { > 1899: return _test.cc2logical( phase->type( in(1) ) ); Suggestion: return _test.cc2logical(phase->type(in(1))); src/hotspot/share/opto/subnode.hpp line 174: > 172: // Compare 2 unsigned values (integer or pointer), returning condition codes (-1, 0 or 1). > 173: class CmpUNode : public CmpNode { > 174: static const Type* Value_cmpu_and_mask(PhaseValues*, const Node*, const Node*); We usually add matching parameter names as found in the source file: static const Type* Value_cmpu_and_mask(PhaseValues* phase, const Node* in1, const Node* in2); or with the renaming above: static const Type* Value_cmpu_and_mask(PhaseValues* phase, const Node* andI, const Node* rhs); ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26666#pullrequestreview-3104636268 PR Review Comment: https://git.openjdk.org/jdk/pull/26666#discussion_r2266033683 PR Review Comment: https://git.openjdk.org/jdk/pull/26666#discussion_r2265997538 PR Review Comment: https://git.openjdk.org/jdk/pull/26666#discussion_r2266009017 PR Review Comment: https://git.openjdk.org/jdk/pull/26666#discussion_r2266038306 From fbredberg at openjdk.org Mon Aug 11 09:01:16 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Mon, 11 Aug 2025 09:01:16 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v4] In-Reply-To: <8TcJ6y_O08pg5m9k3mnOUr0OnQPdPq6LMKOh8oIn1KM=.40cb7a32-8691-4069-bb48-13a767cad50e@github.com> References: <8TcJ6y_O08pg5m9k3mnOUr0OnQPdPq6LMKOh8oIn1KM=.40cb7a32-8691-4069-bb48-13a767cad50e@github.com> Message-ID: On Mon, 11 Aug 2025 05:05:28 GMT, Axel Boldt-Christmas wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update three after review > > Looks good. > > As long as we take a final pass after all code changes has been performed and clean up the comments, variable-, parameter- and function-names. Would be nice to end with a consistent nomenclature, and remove all outdated terms, at least w.r.t. legacy locking. @xmas92 > As long as we take a final pass after all code changes has been performed and clean up the comments, variable-, parameter- and function-names. Would be nice to end with a consistent nomenclature, and remove all outdated terms, at least w.r.t. legacy locking. That is my intention with this: [Cleanup after removing LockingMode related code](https://bugs.openjdk.org/browse/JDK-8365191) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26552#issuecomment-3173828593 From shade at openjdk.org Mon Aug 11 09:14:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 11 Aug 2025 09:14:25 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 07:54:53 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments and modified some comments This looks reasonable to me, thanks. Some nits in the test remain, but they are non-blocking. test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 45: > 43: public class TestFloat16Replicate { > 44: private static short[] input; > 45: private static short[] output; This might give things even more chance to vectorize? Not sure, feel free to ignore. Suggestion: private static final short[] INPUTE; private static final short[] OUTPUT; test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 47: > 45: private static short[] output; > 46: > 47: // Choose FP16_IN_RANGE which is within the range of [-128 << 8, 127 << 8] and a multiple of 256 Suggestion: // Choose FP16_IN_RANGE which is within the range of [-128 << 8, 127 << 8] and a multiple of 256 ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26589#pullrequestreview-3104792951 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2266101795 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2266100000 From bkilambi at openjdk.org Mon Aug 11 10:16:12 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 11 Aug 2025 10:16:12 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE In-Reply-To: References: <0jcw428unzAfdGcqci79xBRxjw3yHN_MxYc7OOuHDz8=.31bd3357-49ff-442f-8d06-58447df49de7@github.com> Message-ID: On Fri, 1 Aug 2025 14:29:47 GMT, Aleksey Shipilev wrote: >>> I am still a bit confused what matches `Replicate` with `immH` that does _not_ fit `immH8_shift8` when `Matcher::vector_length_in_bytes(n) > 16`? >> >> Hi, thanks for your review. If the immediate value does not fit `immH8_shift8` for `Matcher::vector_length_in_bytes(n) > 16` , the compiler would generate `loadConH` [1] -> `replicateHF` [2] backend nodes instead. The constant would be loaded from the constant pool instead and then broadcasted/replicated to every lane of an SVE register. >> >> [1] https://github.com/openjdk/jdk/blob/8ac4a88f3c5ad57824dd192cb3f0af5e71cbceeb/src/hotspot/cpu/aarch64/aarch64.ad#L6963 >> >> [2] https://github.com/openjdk/jdk/blob/8ac4a88f3c5ad57824dd192cb3f0af5e71cbceeb/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4806 > >> If the immediate value does not fit `immH8_shift8` for `Matcher::vector_length_in_bytes(n) > 16` , the compiler would generate `loadConH` [1] -> `replicateHF` [2] backend nodes instead. > > Ah OK, just checking. I ran this patch on the machine where I have originally found the issue, and it seems to work. Thanks for your review comments and approval @shipilev. I will address your review comments in the next patchset along with any other comments. @theRealAph Would you be able to take another look at the updated patch please? Thank you in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3174076977 From galder at openjdk.org Mon Aug 11 10:18:11 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 11 Aug 2025 10:18:11 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 23:33:23 GMT, Francisco Ferrari Bihurriet wrote: > Hi, this pull request is a second take of 1383fec41756322bf2832c55633e46395b937b40, by updating the `CmpUNode` type as either `TypeInt::CC_LE` (case 1a) or `TypeInt::CC_LT` (case 1b) instead of updating the `BoolNode` type as `TypeInt::ONE`. > > With this approach a56cd371a2c497e4323756f8b8a08a0bba059bf2 becomes unnecessary. Additionally, having the right type in `CmpUNode` could potentially enable further optimizations. > > #### Testing > > In order to evaluate the changes, the following testing has been performed: > > * `jdk:tier1` (see [GitHub Actions run](https://github.com/franferrax/jdk/actions/runs/16789994433)) > * [`TestBoolNodeGVN.java`](https://github.com/openjdk/jdk/blob/jdk-26+9/test/hotspot/jtreg/compiler/c2/gvn/TestBoolNodeGVN.java), created for [JDK-8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value](https://bugs.openjdk.org/browse/JDK-8327381) (1383fec41756322bf2832c55633e46395b937b40) > * I also checked it breaks if I remove the `CmpUNode::Value_cmpu_and_mask` call > * Private reproducer for [JDK-8349584: Improve compiler processing](https://bugs.openjdk.org/browse/JDK-8349584) (a56cd371a2c497e4323756f8b8a08a0bba059bf2) > * A local slowdebug run of the `test/hotspot/jtreg/compiler/c2` category on _Fedora Linux x86_64_ > * Same results as with `master` (f95af744b07a9ec87e2507b3d584cbcddc827bbd) Thanks for the PR @franferrax. Did you consider adding an IR test or similar that would expose the inconsistent state? Would it be feasible? ------------- PR Review: https://git.openjdk.org/jdk/pull/26666#pullrequestreview-3105113654 From duke at openjdk.org Mon Aug 11 10:24:29 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 11 Aug 2025 10:24:29 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v22] In-Reply-To: References: Message-ID: <_kUVpJniVUr2ga2ixJCem29yv0r8D7nfLIwqOI2P_ko=.a355d62d-0ca3-4c25-927d-c2aada9d0c6c@github.com> On Mon, 11 Aug 2025 07:45:47 GMT, Yuri Gaevsky wrote: > ... So can you free the unused vector registers? ... Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3174101010 From duke at openjdk.org Mon Aug 11 10:24:29 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 11 Aug 2025 10:24:29 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v24] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: removed reservations for unused vector registers per reviewer's comment; added sanity assertion. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/e14cc8e2..44491863 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=22-23 Stats: 271 lines in 4 files changed: 1 ins; 262 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From galder at openjdk.org Mon Aug 11 10:33:11 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 11 Aug 2025 10:33:11 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors In-Reply-To: References: Message-ID: On Fri, 8 Aug 2025 06:10:56 GMT, Daniel Skantz wrote: > This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). > > In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. > > The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. > > [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 > > [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 > > Testing: T1-4. > > Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsMany.java line 28: > 26: * @bug 8357105 > 27: * @summary Test that repeated stacked string concatenations do not > 28: * consume too many compilation resources. Is there a reasonable way to enhance the test to validate excessive resources? I'm not sure if the following example would work, but I'm wondering if there is something that can be measured deterministically. E.g. before with the given test there would be ~N IR nodes produced but now it would be a max of ~M, assuming that M is deterministically smaller than N. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2266284078 From ayang at openjdk.org Mon Aug 11 10:57:10 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 11 Aug 2025 10:57:10 GMT Subject: RFR: 8352067: Remove the NMT treap and replace its uses with the utilities red-black tree [v2] In-Reply-To: References: Message-ID: <_64fHkIUSnCgZRdwphFKm6LqfUaSv_NKzd0-ivH3nEw=.4fc83148-7758-4ef7-969c-816aa0750a92@github.com> On Wed, 6 Aug 2025 11:07:42 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The utilities red-black tree and the NMT treap serve similar functions. Given the red-black tree's versatility and stricter time complexity, the treap can be removed in favour of it. >> >> I made some modifications to the red-black tree to make it compatible with previous treap usages: >> - Updated the `visit_in_order` and `visit_range_in_order` functions to require the supplied callback to return a bool, which allows us to stop traversing early. >> - Improved const-correctness by ensuring that invoking these functions on a const reference provides const pointers to nodes, while non-const references provide mutable pointers. Previously the two functions behaved differently. >> >> Changes to NMT include: >> - Modified components to align with the updated const-correctness of the red-black tree functions >> - Renamed structures and variables to remove "treap" from their names to reflect the new tree >> >> The treap was also used in one place in C2. I changed this to use the red-black tree and its cursor interface, which I felt was most fitting for the use case. >> >> Testing: >> - Oracle tiers 1-3 > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > feedback fixes Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26655#pullrequestreview-3105279452 From duke at openjdk.org Mon Aug 11 10:59:11 2025 From: duke at openjdk.org (duke) Date: Mon, 11 Aug 2025 10:59:11 GMT Subject: RFR: 8349191: Test compiler/ciReplay/TestIncrementalInlining.java failed In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 09:14:36 GMT, Beno?t Maillard wrote: > This PR fixes a bug caused by synchronization issues in the print inlining system. Individual segments of a single line of output are interleaved with output from other commpile threads, causing tests that parse replay files to fail. > > A snippet of a problematic replay file is shown below: > > > @ 0 compiler.ciReplay.IncrementalInliningTest::level0 (4 bytes) force inline by annotation > @ 0 compiler.ciReplay.IncrementalInliningTest::level1 (4 bytes) inline (hot) > @ 0 compiler.ciReplay.IncrementalInliningTest::level2 (4 bytes) > > > > force inline by annotation > @ 0 compiler.ciReplay.IncrementalInliningTest::late (4 bytes) force inline by annotation late inline succeeded > @ 0 compiler.ciReplay.IncrementalInliningTest::level4 (6 bytes) failed to inline: inlining too deep > > > This makes the output impossible to parse for tests like `compiler/ciReplay/TestIncrementalInlining.java`, as they rely on regular expressions to parse individual lines. Because it is a synchronization issue, the bug quite intermittent and I was only able to reproduce it with mach5 in tier 7. > > This bug was caused by [JDK-8319850](https://bugs.openjdk.org/browse/JDK-8319850), as it introduced important changes in the print inlining system. With these changes, individual segments of the output are printed directly to tty, and this risks causing problematic interleavings with multiple compile threads. > > My proposed solution is to simply print everything to a `stringStream` first, and then dump it to `tty`. The PR also removes the relevant tests from `ProblemList.txt`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8349191) > - [x] tier1-3, plus some internal testing > - [x] tier7 for the relevant tests (`TestIncrementalInlining.java` and `TestInliningProtectionDomain.java`) > > Thanks for reviewing! @benoitmaillard Your change (at version c2bb7cb716d690567d1fefc9440c95fe5f4e62ac) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26654#issuecomment-3174226231 From bmaillard at openjdk.org Mon Aug 11 10:59:12 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 11 Aug 2025 10:59:12 GMT Subject: RFR: 8349191: Test compiler/ciReplay/TestIncrementalInlining.java failed In-Reply-To: References: Message-ID: On Thu, 7 Aug 2025 09:00:37 GMT, Beno?t Maillard wrote: >> src/hotspot/share/opto/printinlining.cpp line 52: >> >>> 50: stringStream ss; >>> 51: _root.dump(&ss, -1); >>> 52: tty->print_raw(ss.freeze()); >> >> General thought: I see that we use the proposed pattern to print a `stringStream` in existing code but also a different pattern with `as_string()`: >> https://github.com/openjdk/jdk/blob/c56fb0b6eff7d3f36bc65f300b784e0dd73c563e/src/hotspot/share/opto/compile.cpp#L614 >> >> Can anybody comment on which one is preferred? > > Good point, I am also curious about the answer. It seems that `as_string()` makes a copy of the internal buffer and returns a pointer to it, while `freeze()` simply returns a pointer to the internal buffer. Then it looks like `freeze()` is the right thing to use here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26654#discussion_r2266341686 From snatarajan at openjdk.org Mon Aug 11 11:17:31 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 11 Aug 2025 11:17:31 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v7] In-Reply-To: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: > **Issue** > Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. > > **Analysis** > On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. > > **Proposal** > Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. > > **Issue in AArch64** > Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. > > **Question to reviewers** > Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? > > **Request to port maintainers** > @dafedafe suggested that we keep the upper bound of `BciProfileWidth` to 1000 pro... Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: addressing review : adding vm.debug and moving a defn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26139/files - new: https://git.openjdk.org/jdk/pull/26139/files/9dae3aef..60da70c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26139&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26139&range=05-06 Stats: 4 lines in 2 files changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26139/head:pull/26139 PR: https://git.openjdk.org/jdk/pull/26139 From snatarajan at openjdk.org Mon Aug 11 11:17:31 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 11 Aug 2025 11:17:31 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v6] In-Reply-To: <1QbX5WHkEdjP-unAFJ1vYaoIc9bV8zz8dA-vKZCkYn8=.8e3704ae-9490-4471-9e5c-dae44004d46f@github.com> References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> <1QbX5WHkEdjP-unAFJ1vYaoIc9bV8zz8dA-vKZCkYn8=.8e3704ae-9490-4471-9e5c-dae44004d46f@github.com> Message-ID: On Mon, 11 Aug 2025 01:38:55 GMT, Tobias Hartmann wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing review - testcase and max range is set to 1000 > > test/hotspot/jtreg/compiler/arguments/TestBciProfileWidth.java line 28: > >> 26: * @summary Test the range defined in globals.hpp for BciProfileWidth >> 27: * @bug 8358696 >> 28: * @run main/othervm -XX:BciProfileWidth=0 > > `BciProfileWidth` is debug only, right? Thank you for the review. I have now included `@requires vm.debug` > test/lib-test/jdk/test/whitebox/vm_flags/IntxTest.java line 41: > >> 39: private static final long COMPILE_THRESHOLD = VmFlagTest.WHITE_BOX.getIntxVMFlag("CompileThreshold"); >> 40: private static final Long[] TESTS = {0L, 100L, (long)(Integer.MAX_VALUE>>3)*100L}; >> 41: private static final String FLAG_DEBUG_NAME = "BinarySwitchThreshold"; > > Why did you move the location of the declaration? This was an oversight. I have moved to where it was declared previously ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26139#discussion_r2266378826 PR Review Comment: https://git.openjdk.org/jdk/pull/26139#discussion_r2266384171 From bmaillard at openjdk.org Mon Aug 11 11:18:17 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 11 Aug 2025 11:18:17 GMT Subject: Integrated: 8349191: Test compiler/ciReplay/TestIncrementalInlining.java failed In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 09:14:36 GMT, Beno?t Maillard wrote: > This PR fixes a bug caused by synchronization issues in the print inlining system. Individual segments of a single line of output are interleaved with output from other commpile threads, causing tests that parse replay files to fail. > > A snippet of a problematic replay file is shown below: > > > @ 0 compiler.ciReplay.IncrementalInliningTest::level0 (4 bytes) force inline by annotation > @ 0 compiler.ciReplay.IncrementalInliningTest::level1 (4 bytes) inline (hot) > @ 0 compiler.ciReplay.IncrementalInliningTest::level2 (4 bytes) > > > > force inline by annotation > @ 0 compiler.ciReplay.IncrementalInliningTest::late (4 bytes) force inline by annotation late inline succeeded > @ 0 compiler.ciReplay.IncrementalInliningTest::level4 (6 bytes) failed to inline: inlining too deep > > > This makes the output impossible to parse for tests like `compiler/ciReplay/TestIncrementalInlining.java`, as they rely on regular expressions to parse individual lines. Because it is a synchronization issue, the bug quite intermittent and I was only able to reproduce it with mach5 in tier 7. > > This bug was caused by [JDK-8319850](https://bugs.openjdk.org/browse/JDK-8319850), as it introduced important changes in the print inlining system. With these changes, individual segments of the output are printed directly to tty, and this risks causing problematic interleavings with multiple compile threads. > > My proposed solution is to simply print everything to a `stringStream` first, and then dump it to `tty`. The PR also removes the relevant tests from `ProblemList.txt`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8349191) > - [x] tier1-3, plus some internal testing > - [x] tier7 for the relevant tests (`TestIncrementalInlining.java` and `TestInliningProtectionDomain.java`) > > Thanks for reviewing! This pull request has now been integrated. Changeset: a60e523f Author: Beno?t Maillard Committer: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/a60e523f88e7022abe80725b82a8b16a87a377e2 Stats: 7 lines in 2 files changed: 3 ins; 3 del; 1 mod 8349191: Test compiler/ciReplay/TestIncrementalInlining.java failed Reviewed-by: mhaessig, dfenacci, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/26654 From snatarajan at openjdk.org Mon Aug 11 11:20:15 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 11 Aug 2025 11:20:15 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v6] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: On Sat, 9 Aug 2025 15:35:00 GMT, Saranya Natarajan wrote: >> **Issue** >> Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. >> >> **Analysis** >> On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. >> >> **Proposal** >> Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. >> >> **Issue in AArch64** >> Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? >> >> **Request to port maintainers** >> @dafedafe suggested that we keep the upper boun... > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > Addressing review - testcase and max range is set to 1000 In the process of adding port maintainers to this PR, I by mistake added (and removed) some of them as contributor. I will update the contributor list before closing the PR. Sorry for the incovience ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3174300135 From jbhateja at openjdk.org Mon Aug 11 11:46:16 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 11 Aug 2025 11:46:16 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v9] In-Reply-To: References: Message-ID: On Fri, 8 Aug 2025 08:21:42 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: > > - Add microbench > - Add missing test method declarations Hi @eme64 , Can you kindly run this through the Oracle test framework and approve. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3105449239 From jbhateja at openjdk.org Mon Aug 11 11:46:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 11 Aug 2025 11:46:17 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v9] In-Reply-To: References: Message-ID: On Fri, 8 Aug 2025 08:21:56 GMT, Qizheng Xing wrote: >> Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add microbench >> - Add missing test method declarations > > Hi @jatin-bhateja, I've added a micro benchmark that includes the `numberOfNibbles` implementation from this PR description and your micro kernel. > > Here's my test results on an Intel(R) Xeon(R) Platinum: > > > # Baseline: > Benchmark Mode Cnt Score Error Units > CountLeadingZeros.benchClzLongConstrained avgt 15 1517.888 ? 5.691 ns/op > CountLeadingZeros.benchNumberOfNibbles avgt 15 1094.422 ? 1.753 ns/op > > # This patch: > Benchmark Mode Cnt Score Error Units > CountLeadingZeros.benchClzLongConstrained avgt 15 0.948 ? 0.002 ns/op > CountLeadingZeros.benchNumberOfNibbles avgt 15 942.438 ? 1.742 ns/op Thanks @MaxXSoft , I have created another JBS to optimize popcount using knownbits https://bugs.openjdk.org/browse/JDK-8365205 Changes look good to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3174415748 From aph at openjdk.org Mon Aug 11 12:12:12 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 11 Aug 2025 12:12:12 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 07:54:53 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments and modified some comments For `loadConH`, LLVM and GCC use mov wscratch, #const dup v0.4h, wscratch We should investigate that. As far as I can see, LLVM and GCC do this for all vector immediates that don't need more than 2 movz/movk instructions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3174494648 From cnorrbin at openjdk.org Mon Aug 11 12:26:18 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 11 Aug 2025 12:26:18 GMT Subject: RFR: 8352067: Remove the NMT treap and replace its uses with the utilities red-black tree [v2] In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 11:07:42 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The utilities red-black tree and the NMT treap serve similar functions. Given the red-black tree's versatility and stricter time complexity, the treap can be removed in favour of it. >> >> I made some modifications to the red-black tree to make it compatible with previous treap usages: >> - Updated the `visit_in_order` and `visit_range_in_order` functions to require the supplied callback to return a bool, which allows us to stop traversing early. >> - Improved const-correctness by ensuring that invoking these functions on a const reference provides const pointers to nodes, while non-const references provide mutable pointers. Previously the two functions behaved differently. >> >> Changes to NMT include: >> - Modified components to align with the updated const-correctness of the red-black tree functions >> - Renamed structures and variables to remove "treap" from their names to reflect the new tree >> >> The treap was also used in one place in C2. I changed this to use the red-black tree and its cursor interface, which I felt was most fitting for the use case. >> >> Testing: >> - Oracle tiers 1-3 > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > feedback fixes Thank you for the reviews! Let's ship it :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26655#issuecomment-3174530098 From cnorrbin at openjdk.org Mon Aug 11 12:26:19 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 11 Aug 2025 12:26:19 GMT Subject: Integrated: 8352067: Remove the NMT treap and replace its uses with the utilities red-black tree In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 09:29:05 GMT, Casper Norrbin wrote: > Hi everyone, > > The utilities red-black tree and the NMT treap serve similar functions. Given the red-black tree's versatility and stricter time complexity, the treap can be removed in favour of it. > > I made some modifications to the red-black tree to make it compatible with previous treap usages: > - Updated the `visit_in_order` and `visit_range_in_order` functions to require the supplied callback to return a bool, which allows us to stop traversing early. > - Improved const-correctness by ensuring that invoking these functions on a const reference provides const pointers to nodes, while non-const references provide mutable pointers. Previously the two functions behaved differently. > > Changes to NMT include: > - Modified components to align with the updated const-correctness of the red-black tree functions > - Renamed structures and variables to remove "treap" from their names to reflect the new tree > > The treap was also used in one place in C2. I changed this to use the red-black tree and its cursor interface, which I felt was most fitting for the use case. > > Testing: > - Oracle tiers 1-3 This pull request has now been integrated. Changeset: 0ad919c1 Author: Casper Norrbin URL: https://git.openjdk.org/jdk/commit/0ad919c1e54895b000b58f6a1b54d79f76970845 Stats: 1016 lines in 14 files changed: 124 ins; 816 del; 76 mod 8352067: Remove the NMT treap and replace its uses with the utilities red-black tree Reviewed-by: jsjolen, ayang ------------- PR: https://git.openjdk.org/jdk/pull/26655 From mhaessig at openjdk.org Mon Aug 11 13:43:27 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 11 Aug 2025 13:43:27 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Mon, 11 Aug 2025 00:08:11 GMT, Emanuel Peter wrote: >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. >> - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. >> - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` >> - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + Conv... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java > > Co-authored-by: Manuel H?ssig Thank you for addressing my comments so far. Here goes another round :) src/hotspot/share/opto/mempointer.cpp line 85: > 83: } > 84: // Bail out if scale is NaN. > 85: if (scale.is_NaN()) { If I understand correctly, then a summand cannot be NaN anymore? Do you still bail out somewhere in raw summands if you encounter NaN? src/hotspot/share/opto/vectorization.cpp line 494: > 492: // > 493: // This would allow situations where for some iv p1 is lower than p2, and for > 494: // other iv p1 is higher than p2. This is not very useful inpractice. We can Suggestion: // other iv p1 is higher than p2. This is not very useful in practice. We can src/hotspot/share/opto/vectorization.cpp line 598: > 596: // If iv_stride <= 0, i.e. last <= iv <= init: > 597: // (iv - init) * scale_1 >= (iv - init) * iv_scale > 598: // (iv - last) * scale_1 <= (iv - last) * iv_scale (NEG-STRIDE) Suggestion: // If iv_stride >= 0, i.e. init <= iv <= last: // (iv - init) * iv_scale_1 <= (iv - init) * iv_scale2 // (iv - last) * iv_scale_1 >= (iv - last) * iv_scale2 (POS-STRIDE) // If iv_stride <= 0, i.e. last <= iv <= init: // (iv - init) * iv_scale_1 >= (iv - init) * iv_scale2 // (iv - last) * iv_scale_1 <= (iv - last) * iv_scale2 (NEG-STRIDE) If I am not massively confused, the `iv_scale`s should be like this. src/hotspot/share/opto/vectorization.cpp line 604: > 602: // p1(init) + size1 <= p2(init) (if iv_stride >= 0) | p2(last) + size2 <= p1(last) (if iv_stride >= 0) | > 603: // p1(last) + size1 <= p2(last) (if iv_stride <= 0) | p2(init) + size2 <= p1(init) (if iv_stride <= 0) | > 604: // ----- is equivalent to ----- | ----- is equivalent to ----- | Suggestion: // ---- are equivalent to ----- | ---- are equivalent to ----- | This confused me a bit ? src/hotspot/share/opto/vectorization.cpp line 625: > 623: // <= size1 + p1(init) - init * iv_scale2 + iv * iv_scale2 | <= size2 + p2(last) - init * iv_scale1 + iv * iv_scale1 | > 624: // -- assumption -- | -- assumption -- | > 625: // <= p2(init) - init * iv_scale2 + iv * iv_scale2 | <= p1(last) - init * iv_scale1 + iv * iv_scale1 | Suggestion: // = size1 + p1(init) - init * iv_scale1 + iv * iv_scale1 | = size2 + p2(last) - last * iv_scale2 + iv * iv_scale2 | // ------ apply (POS-STRIDE) --------- | ------ apply (POS-STRIDE) --------- | // <= size1 + p1(init) - init * iv_scale2 + iv * iv_scale2 | <= size2 + p2(last) - last * iv_scale1 + iv * iv_scale1 | // -- assumption -- | -- assumption -- | // <= p2(init) - init * iv_scale2 + iv * iv_scale2 | <= p1(last) - last * iv_scale1 + iv * iv_scale1 | `LINEAR-FORM-LAST: p1(iv) = p1(last) - last * iv_scale1 + iv * iv_scale1` src/hotspot/share/opto/vectorization.cpp line 639: > 637: // <= size1 + p1(last) - init * iv_scale2 + iv * iv_scale2 | <= size2 + p2(init) - init * iv_scale1 + iv * iv_scale1 | > 638: // -- assumption -- | -- assumption -- | > 639: // <= p2(last) - init * iv_scale2 + iv * iv_scale2 | <= p1(init) - init * iv_scale1 + iv * iv_scale1 | Suggestion: // = size1 + p1(last) - last * iv_scale1 + iv * iv_scale1 | = size2 + p2(init) - init * iv_scale2 + iv * iv_scale2 | // ------ apply (NEG-STRIDE) --------- | ------ apply (NEG-STRIDE) --------- | // <= size1 + p1(last) - last * iv_scale2 + iv * iv_scale2 | <= size2 + p2(init) - init * iv_scale1 + iv * iv_scale1 | // -- assumption -- | -- assumption -- | // <= p2(last) - last * iv_scale2 + iv * iv_scale2 | <= p1(init) - init * iv_scale1 + iv * iv_scale1 | src/hotspot/share/opto/vectorization.cpp line 742: > 740: // a solution that also works when the loop is not entered: > 741: // > 742: // k = (init - stride - 1) / abs(stride) Suggestion: // k = (init - limit - 1) / abs(stride) Where does `stride` come from? If I did not miss anything, this should be `limit`. src/hotspot/share/opto/vectorization.cpp line 743: > 741: // > 742: // k = (init - stride - 1) / abs(stride) > 743: // last = MAX(init, init + k * stride) Suggestion: // last = MIN(init, init + k * stride) This should be `MIN` otherwise this does not clamp to zero. src/hotspot/share/opto/vectorization.cpp line 752: > 750: // If stride < 0: > 751: // k = (init - stride - 1) / abs(stride) > 752: // last = MAX(init, init + k * stride) Suggestion: // LAST(init, stride, limit) // If stride > 0: // k = (limit - init - 1) / abs(stride) // last = MAX(init, init + k * stride) // If stride < 0: // k = (init - limit - 1) / abs(stride) // last = MIN(init, init + k * stride) src/hotspot/share/opto/vectorization.cpp line 853: > 851: // For the computation of main_init, we also need the pre_limit, and so we need > 852: // to check that this value is pre-loop invariant. In the case of non-equal iv_scales, > 853: // we also need toe main_limit in the aliasing check, and so this value must then Suggestion: // we also need the main_limit in the aliasing check, and so this value must then src/hotspot/share/opto/vectorization.cpp line 895: > 893: Node* diffL = (stride > 0) ? new SubLNode(limitL, initL) > 894: : new SubLNode(initL, limitL); > 895: Node* diffL_m1 = new AddLNode(diffL, igvn.longcon(-1)); Out of curiosity, why did you choose `AddL(diff, -1)` over `SubL(diff, 1)`? src/hotspot/share/opto/vectorization.cpp line 1026: > 1024: if (vp1.iv_scale() > vp2.iv_scale()) { > 1025: swap(p1_init, p2_init); > 1026: swap(size1, size2); Shouldn't we perform this swap before calling `make_last()`, since `make_last()` assumes `iv_scale1 < iv_scale2`? test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java line 62: > 60: for (String sac : List.of("-XX:-UseAutoVectorizationSpeculativeAliasingChecks", "-XX:+UseAutoVectorizationSpeculativeAliasingChecks")) { > 61: TestFramework.runWithFlags("--add-modules", "java.base", "--add-exports", "java.base/jdk.internal.misc=ALL-UNNAMED", > 62: "-XX:+UnlockExperimentalVMOptions", av, coh, sac); This might be a good fit for `Scenarios`. I find it easier to determine which cases failed. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/24278#pullrequestreview-3104618375 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2265986953 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266065094 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266095980 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266124115 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266266510 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266282080 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266464492 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266514460 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266519259 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266449265 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266475956 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266556670 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266809686 From mhaessig at openjdk.org Mon Aug 11 13:43:28 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 11 Aug 2025 13:43:28 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Mon, 11 Aug 2025 12:14:08 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java >> >> Co-authored-by: Manuel H?ssig > > src/hotspot/share/opto/vectorization.cpp line 752: > >> 750: // If stride < 0: >> 751: // k = (init - stride - 1) / abs(stride) >> 752: // last = MAX(init, init + k * stride) > > Suggestion: > > // LAST(init, stride, limit) > // If stride > 0: > // k = (limit - init - 1) / abs(stride) > // last = MAX(init, init + k * stride) > // If stride < 0: > // k = (init - limit - 1) / abs(stride) > // last = MIN(init, init + k * stride) Or to be a bit closer to the implementation: Suggestion: // LAST(init, stride, limit) // c = stride > 0 ? 1 : -1; // k = (c * (limit - init) - 1) / abs(stride) // If stride > 0: // last = MAX(init, init + k * stride) // If stride < 0: // last = MIN(init, init + k * stride) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266522715 From mhaessig at openjdk.org Mon Aug 11 13:44:14 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 11 Aug 2025 13:44:14 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask In-Reply-To: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: On Fri, 8 Aug 2025 12:30:36 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. > > In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. The major and obivous mistake I committed myself with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first `delete`-s, then asks for `next` (facepalms). > > Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. > > I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. > > Additional testing: > - [x] Linux AArch64 server fastdebug, reproducer no longer fails > - [x] Linux AArch64 server fastdebug, `compiler` > - [x] Linux AArch64 server fastdebug, `all` Marked as reviewed by mhaessig (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26696#pullrequestreview-3105960777 From yzheng at openjdk.org Mon Aug 11 14:06:56 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 11 Aug 2025 14:06:56 GMT Subject: RFR: 8365218: [JVMCI] AArch64 CPU features are not computed correctly after 8364128 Message-ID: https://github.com/openjdk/jdk/pull/26515 changes the `VM_Version::CPU_` constant values on AArch64 and Graal now sees unsupported CPU features. This may result in SIGILL due to Graal emitting unsupported instructions, such as `CPU_SHA3`-based eor3 instructions in AArch64 SHA3 stubs. ------------- Commit messages: - [JVMCI] AArch64 CPU features are not computed correctly after 8364128 Changes: https://git.openjdk.org/jdk/pull/26727/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26727&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365218 Stats: 44 lines in 2 files changed: 42 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26727.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26727/head:pull/26727 PR: https://git.openjdk.org/jdk/pull/26727 From galder at openjdk.org Mon Aug 11 14:09:19 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 11 Aug 2025 14:09:19 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: On Sun, 10 Aug 2025 05:23:23 GMT, Emanuel Peter wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Check at the very least that auto vectorization is supported > > src/hotspot/share/opto/superword.cpp line 1635: > >> 1633: } else if (VectorNode::is_convert_opcode(opc)) { >> 1634: retValue = VectorCastNode::implemented(opc, size, velt_basic_type(p0->in(1)), velt_basic_type(p0)); >> 1635: } else if (VectorNode::is_reinterpret_opcode(opc)) { > > How does this affect `Op_ReinterpretHF2S` that is also in `VectorNode::is_reinterpret_opcode`? > I'm afraid that we need to test this with hardware or Intel's SDE, to make sure we have it running on a VM that actually supports Float16. Otherwise these instructions may not be used, and hence not tested right. > > @galderz Can you run the relevant tests? Would you run specific tiers in those platforms? Just hotspot compiler? Or individual tests such as `ConvF2HFIdealizationTests` and `TestFloat16ScalarOperations`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2266889792 From epeter at openjdk.org Mon Aug 11 14:14:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 11 Aug 2025 14:14:14 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: On Mon, 11 Aug 2025 14:06:41 GMT, Galder Zamarre?o wrote: >> src/hotspot/share/opto/superword.cpp line 1635: >> >>> 1633: } else if (VectorNode::is_convert_opcode(opc)) { >>> 1634: retValue = VectorCastNode::implemented(opc, size, velt_basic_type(p0->in(1)), velt_basic_type(p0)); >>> 1635: } else if (VectorNode::is_reinterpret_opcode(opc)) { >> >> How does this affect `Op_ReinterpretHF2S` that is also in `VectorNode::is_reinterpret_opcode`? >> I'm afraid that we need to test this with hardware or Intel's SDE, to make sure we have it running on a VM that actually supports Float16. Otherwise these instructions may not be used, and hence not tested right. >> >> @galderz Can you run the relevant tests? > > Would you run specific tiers in those platforms? Just hotspot compiler? Or individual tests such as `ConvF2HFIdealizationTests` and `TestFloat16ScalarOperations`? Honestly, I don't know, I'd have to do the research myself. Probably focusing on the Float16 tests would be good enough. No other test would really use Float16, so running anything else would not be that useful probably. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2266903230 From dnsimon at openjdk.org Mon Aug 11 14:47:10 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 11 Aug 2025 14:47:10 GMT Subject: RFR: 8365218: [JVMCI] AArch64 CPU features are not computed correctly after 8364128 In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 13:59:55 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/26515 changes the `VM_Version::CPU_` constant values on AArch64 and Graal now sees unsupported CPU features. This may result in SIGILL due to Graal emitting unsupported instructions, such as `CPU_SHA3`-based eor3 instructions in AArch64 SHA3 stubs. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotJVMCIBackendFactory.java line 45: > 43: > 44: /** > 45: * Converts a bit mask of CPU features to enum constants. What's the difference between this new method and the existing `convertFeatures` methods? Is there some way we can consolidate all these versions as they look quite similar at a glance. If not, then please add javadoc to each explaining what's unique about it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26727#discussion_r2267009657 From jsjolen at openjdk.org Mon Aug 11 14:50:31 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 11 Aug 2025 14:50:31 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers Message-ID: Hi, This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. Testing: GHA ------------- Commit messages: - Fix the bug - Keep invariant - Delete unnecessary assert - Change comment - Explicitly assign _mutable_data to nullptr - Use a base pointer and a -1 index start instead Changes: https://git.openjdk.org/jdk/pull/26569/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26569&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365256 Stats: 83 lines in 4 files changed: 18 ins; 24 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/26569.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26569/head:pull/26569 PR: https://git.openjdk.org/jdk/pull/26569 From jsjolen at openjdk.org Mon Aug 11 14:50:34 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 11 Aug 2025 14:50:34 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers In-Reply-To: References: Message-ID: On Thu, 31 Jul 2025 06:17:24 GMT, Johan Sj?len wrote: > Hi, > > This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. > > Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. > > Testing: GHA @bulasevich , @vnkozlov I did this small refactoring of `RelocIterator` to get rid of `blob_end()` as a marker of 'dead' `_mutable_data`. What do you think, should I make a JBS ticket for this and put this into 'ready for review'? The build failures are all from after [Explicitly assign _mutable_data to nullptr](https://github.com/openjdk/jdk/pull/26569/commits/75a3853b65f264666c470a3ba6b1791dce6c775d), fixing the issues should be trivial. > Is this change intended to resolve JDK-8361382 (NMT header corruption)? If so, please link it in the PR description and describe how the new logic prevents that corruption. It's not intended to resolve it, but it does remove one potential source of the issue. > However, since relocation iteration is on a performance-critical path, benchmarks should be run to ensure that the added integer field and array indexing introduce no measurable regression. Yeah, we can check that. Note that we have the same size, as we replaced 1 8-byte field with 2 4-byte fields. I also suspect that the pointer addition (probably a `lea r0, [ r0 + r1 ]` on x64) won't introduce a performance regression, but nothing wrong with checking. src/hotspot/share/code/codeBlob.cpp line 213: > 211: delete _oop_maps; > 212: _oop_maps = nullptr; > 213: `free` and `delete` on null pointers are OK, no need to check. src/hotspot/share/code/nmethod.cpp line 2156: > 2154: os::free(_immutable_data); > 2155: _immutable_data = nullptr; > 2156: `free` and `delete` on null pointers are OK, no need to check. src/hotspot/share/code/relocInfo.cpp line 157: > 155: _current = 0; > 156: set_has_current(true); > 157: } This 'singleton' constructor allows us to remove `set_current()`. src/hotspot/share/code/relocInfo.cpp line 157: > 155: _current = -1; > 156: } > 157: So that we can remove `set_current()`. src/hotspot/share/code/relocInfo.hpp line 589: > 587: assert(has_current(), "must have current"); > 588: return current_no_check(); > 589: } I had to add the `current_no_check()` as the `print_on` method used to read `_current` directly, without the assert check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26569#issuecomment-3138983480 PR Comment: https://git.openjdk.org/jdk/pull/26569#issuecomment-3141038241 PR Review Comment: https://git.openjdk.org/jdk/pull/26569#discussion_r2244666717 PR Review Comment: https://git.openjdk.org/jdk/pull/26569#discussion_r2244667082 PR Review Comment: https://git.openjdk.org/jdk/pull/26569#discussion_r2244672059 PR Review Comment: https://git.openjdk.org/jdk/pull/26569#discussion_r2244674949 PR Review Comment: https://git.openjdk.org/jdk/pull/26569#discussion_r2244676713 From bulasevich at openjdk.org Mon Aug 11 14:50:34 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 11 Aug 2025 14:50:34 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers In-Reply-To: References: Message-ID: <0k9Mhl7FcMsAx1vjvX_9LRuy6siRBleve5vDTVMEmY0=.ac123d15-af74-4bec-9d0d-b2f6d168166c@github.com> On Thu, 31 Jul 2025 06:17:24 GMT, Johan Sj?len wrote: > Hi, > > This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. > > Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. > > Testing: GHA That is interesting. Thanks! Is this change intended to resolve JDK-8361382 (NMT header corruption)? If so, please link it in the PR description and describe how the new logic prevents that corruption. I remember that vnkozlov supports the blob_end() approach, so it?s best to wait for his input before changing it back to nullptr. As for me, I agree that nullptr is a better sentinel than blob_end(). We originally switched to blob_end() for _mutable_data to work around JDK-8352112 (UBSan error on null-pointer offset), but that led soon to the intermittent JDK-8361304 crash in CodeCache::aggregate. Restoring nullptr avoids those pitfalls. Replacing the two-pointer scheme (_current/_end) with an index-based design (_base/_current/_len) simplifies the logic and removes the dummy-pointer workaround. However, since relocation iteration is on a performance-critical path, benchmarks should be run to ensure that the added integer field and array indexing introduce no measurable regression. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26569#issuecomment-3140527868 From mhaessig at openjdk.org Mon Aug 11 15:32:41 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 11 Aug 2025 15:32:41 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v2] In-Reply-To: References: Message-ID: > A loop of the form > > MemorySegment ms = {}; > for (long i = 0; i < ms.byteSize() / 8L; i++) { > // vectorizable work > } > > does not vectorize, whereas > > MemorySegment ms = {}; > long size = ms.byteSize(); > for (long i = 0; i < size / 8L; i++) { > // vectorizable work > } > > vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: > > https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 > > Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. > > So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. > > @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem > > ## Change Description > > Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. > >
Explored Alternatives > 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. > 2. Insert a "`PHASEIDEALLOOP0`" with `LoopOptsNone` that only perfor... Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: - Fix debug print - Test more flags - Renaming and comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26429/files - new: https://git.openjdk.org/jdk/pull/26429/files/685557ca..0c200787 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=00-01 Stats: 36 lines in 4 files changed: 17 ins; 8 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/26429.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26429/head:pull/26429 PR: https://git.openjdk.org/jdk/pull/26429 From mhaessig at openjdk.org Mon Aug 11 15:32:41 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 11 Aug 2025 15:32:41 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v2] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 01:11:09 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: >> >> - Fix debug print >> - Test more flags >> - Renaming and comments > > src/hotspot/share/opto/loopnode.hpp line 1645: > >> 1643: _total_wins(0), >> 1644: _loop_entry_wins(0), >> 1645: _loop_back_wins(0) {}; > > Can you describe somewhere what the definition of these is? > I'm struggling a little with understanding the conditions in `profitable`. I hope it is better now. > src/hotspot/share/opto/loopopts.cpp line 239: > >> 237: } else { >> 238: tty->print("Region "); >> 239: } > > What if it is another kind of loop? Could it be a `LongCountedLoop` or something else we don't have yet? > I suggest you just use `region->Name()` and format that string into your output. I did not know about that. > test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateAddSub.java line 351: > >> 349: @IR(counts = {IRNode.SUB_I, "1"}) >> 350: public int addSubInt(int inv1, int inv2, int size) { >> 351: int result = -1; > > Can you document where the adds are? > Do we manage to re-assoriate `inv1 + (inv2 - i)` to `(inv1 + inv2) - i` so that the addition can float out of the loop? Yes, the addition floats, and the loop disappears. > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentByteSizeLongLoopLimit.java line 38: > >> 36: * @library /test/lib / >> 37: * @run driver compiler.loopopts.superword.TestMemorySegmentByteSizeLongLoopLimit >> 38: */ > > For MemorySegment tests, I've made the experience that it is quite important to test out some runs with additional flag combinations: at least `AlignVector` and `ShortRunningLongLoop`. Same might apply for the tests below. I added scenarios. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2267130654 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2267133830 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2267129512 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2267130083 From qamai at openjdk.org Mon Aug 11 15:46:17 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 11 Aug 2025 15:46:17 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v2] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 15:32:41 GMT, Manuel H?ssig wrote: >> A loop of the form >> >> MemorySegment ms = {}; >> for (long i = 0; i < ms.byteSize() / 8L; i++) { >> // vectorizable work >> } >> >> does not vectorize, whereas >> >> MemorySegment ms = {}; >> long size = ms.byteSize(); >> for (long i = 0; i < size / 8L; i++) { >> // vectorizable work >> } >> >> vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: >> >> https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 >> >> Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. >> >> So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. >> >> @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem >> >> ## Change Description >> >> Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. >> >>
Explored Alternatives >> 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. >> 2. I... > > Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: > > - Fix debug print > - Test more flags > - Renaming and comments src/hotspot/share/opto/loopnode.hpp line 1639: > 1637: // Sum of all wins regardless of where they happen. > 1638: int _total_wins; > 1639: // Number of wins on a loop entry edge, which only pays dividens once per loop execution. You should specify that "If the split is through a loop head", otherwise `0`. Also, typo `dividends` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2267177301 From kvn at openjdk.org Mon Aug 11 16:07:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 11 Aug 2025 16:07:13 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers In-Reply-To: References: Message-ID: On Thu, 31 Jul 2025 06:17:24 GMT, Johan Sj?len wrote: > Hi, > > This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. > > Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. > > Testing: GHA src/hotspot/share/code/codeBlob.cpp line 211: > 209: _mutable_data_size = 0; > 210: delete _oop_maps; > 211: _oop_maps = nullptr; You missed `_relocation_size = 0;` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26569#discussion_r2267249556 From kvn at openjdk.org Mon Aug 11 16:07:14 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 11 Aug 2025 16:07:14 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers In-Reply-To: References: Message-ID: <2VOoZpHlcOaGUQYAfPF3HfzGoFzchn8PUiLfvvuKw-0=.98555a5d-5826-4644-a376-3ab12dc2fbd4@github.com> On Thu, 31 Jul 2025 08:07:28 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. >> >> Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. >> >> Testing: GHA > > src/hotspot/share/code/nmethod.cpp line 2156: > >> 2154: os::free(_immutable_data); >> 2155: _immutable_data = nullptr; >> 2156: > > `free` and `delete` on null pointers are OK, no need to check. May be add `_immutable_data_size = 0` for completeness. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26569#discussion_r2267257898 From kvn at openjdk.org Mon Aug 11 16:07:15 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 11 Aug 2025 16:07:15 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers In-Reply-To: <2VOoZpHlcOaGUQYAfPF3HfzGoFzchn8PUiLfvvuKw-0=.98555a5d-5826-4644-a376-3ab12dc2fbd4@github.com> References: <2VOoZpHlcOaGUQYAfPF3HfzGoFzchn8PUiLfvvuKw-0=.98555a5d-5826-4644-a376-3ab12dc2fbd4@github.com> Message-ID: On Mon, 11 Aug 2025 16:03:24 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/code/nmethod.cpp line 2156: >> >>> 2154: os::free(_immutable_data); >>> 2155: _immutable_data = nullptr; >>> 2156: >> >> `free` and `delete` on null pointers are OK, no need to check. > > May be add `_immutable_data_size = 0` for completeness. Since `delete` allowing nullptr may be remove the check when `delete _pc_desc_container` too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26569#discussion_r2267263607 From kvn at openjdk.org Mon Aug 11 16:19:14 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 11 Aug 2025 16:19:14 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers In-Reply-To: References: Message-ID: <-H56WnJbNgLfc-8RqBS8vhifvTWn2ZqzPMM2h9sTBxw=.119edc16-48e8-4e23-9299-4838b9f6d025@github.com> On Thu, 31 Jul 2025 06:17:24 GMT, Johan Sj?len wrote: > Hi, > > This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. > > Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. > > Testing: GHA src/hotspot/share/code/relocInfo.hpp line 567: > 565: relocInfo* _base; // base pointer into relocInfo array > 566: int _current; // current index > 567: int _len; // length Yes, this keep the size the same. But you have opportunity to reduce it by moving `_data` before `_databuf` field. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26569#discussion_r2267306514 From shade at openjdk.org Mon Aug 11 16:22:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 11 Aug 2025 16:22:56 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask [v2] In-Reply-To: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: > See the bug for more investigation. > > In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. The major and obivous mistake I committed myself with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first `delete`-s, then asks for `next` (facepalms). > > Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. > > I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. > > Additional testing: > - [x] Linux AArch64 server fastdebug, reproducer no longer fails > - [x] Linux AArch64 server fastdebug, `compiler` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Chicken out of memset-ing the possibly vtable-bearing object ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26696/files - new: https://git.openjdk.org/jdk/pull/26696/files/e5f0a180..64d0b3c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26696&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26696&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26696/head:pull/26696 PR: https://git.openjdk.org/jdk/pull/26696 From shade at openjdk.org Mon Aug 11 16:22:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 11 Aug 2025 16:22:56 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask In-Reply-To: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: On Fri, 8 Aug 2025 12:30:36 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. > > In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. The major and obivous mistake I committed myself with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first `delete`-s, then asks for `next` (facepalms). > > Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. > > I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. > > Additional testing: > - [x] Linux AArch64 server fastdebug, reproducer no longer fails > - [x] Linux AArch64 server fastdebug, `compiler` > - [x] Linux AArch64 server fastdebug, `all` Looking at the patch before integration led me to a more paranoid path: chickening out of `memset`-ing the C++ object in destructor. For several reasons: 1. There is no vtable in `CompileTask` _now_, but nothing prevents us from adding it later. Calling `memset` from subclass/virtual destructor might be a cause of fun bugs then. We do this `memset` in some places in Hotspot, but only for `struct`-looking classes. 2. The newly added clearing of `next`/`prev` on dequeue-ing prevents most of the accidents of walking to random memory, and also makes explicit zapping not that useful to reproduce bugs. 3. I think we should look into a more generic zapping for all `CHeapObj`-s, when we know it is safe (e.g. right before calling `free` to native allocator). This is tracked by [JDK-8365165](https://bugs.openjdk.org/browse/JDK-8365165). This does not invalidate testing, since I just removed the `ASSERT` block. So I can integrate after someone re-approves :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26696#issuecomment-3175697792 From kvn at openjdk.org Mon Aug 11 16:29:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 11 Aug 2025 16:29:13 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers In-Reply-To: References: Message-ID: On Thu, 31 Jul 2025 06:17:24 GMT, Johan Sj?len wrote: > Hi, > > This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. > > Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. > > Testing: GHA Thank you @jdksjolen for doing these changes. The only reason I kept not null default (and purged) value is to avoid asserts we are hitting in various parts of this code which still assumes that mutable and immutable data is collocated with nmethod code and we can use pointers without issue. I don't much worry about performance of relocation info. `metadata_do()` is used only with RedefineClasses which is triggering deoptimization anyway. `oops_do()` use relocation info for embedded oops only on x86. This is rare case and we can further reduce impact by adding nmethod's flag indicating presence of embedded oops. ------------- PR Review: https://git.openjdk.org/jdk/pull/26569#pullrequestreview-3106711001 From kvn at openjdk.org Mon Aug 11 16:29:14 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 11 Aug 2025 16:29:14 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers In-Reply-To: References: Message-ID: On Thu, 31 Jul 2025 19:00:10 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. >> >> Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. >> >> Testing: GHA > > The build failures are all from after [Explicitly assign _mutable_data to nullptr](https://github.com/openjdk/jdk/pull/26569/commits/75a3853b65f264666c470a3ba6b1791dce6c775d), fixing the issues should be trivial. > >> Is this change intended to resolve JDK-8361382 (NMT header corruption)? If so, please link it in the PR description and describe how the new logic prevents that corruption. > > It's not intended to resolve it, but it does remove one potential source of the issue. > >> However, since relocation iteration is on a performance-critical path, benchmarks should be run to ensure that the added integer field and array indexing introduce no measurable regression. > > Yeah, we can check that. Note that we have the same size, as we replaced 1 8-byte field with 2 4-byte fields. I also suspect that the pointer addition (probably a `lea r0, [ r0 + r1 ]` on x64) won't introduce a performance regression, but nothing wrong with checking. @jdksjolen please run tier1-4 testing in mach5, GHA is not enough for such changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26569#issuecomment-3175776356 From kvn at openjdk.org Mon Aug 11 16:37:11 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 11 Aug 2025 16:37:11 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask [v2] In-Reply-To: References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: On Mon, 11 Aug 2025 16:22:56 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. >> >> In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. The major and obivous mistake I committed myself with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first `delete`-s, then asks for `next` (facepalms). >> >> Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. >> >> I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, reproducer no longer fails >> - [x] Linux AArch64 server fastdebug, `compiler` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Chicken out of memset-ing the possibly vtable-bearing object Re-approved. Please wait GHA testing to finish before integration. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26696#pullrequestreview-3106760349 From sparasa at openjdk.org Mon Aug 11 17:45:24 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 11 Aug 2025 17:45:24 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX Message-ID: The goal of this PR is to address the failure caused by x86 forward jump offset exceeding imm8 displacement when running the HotSpot jtreg test `test/hotspot/jtreg/compiler/c2/TestLWLockingCodeGen.java` using Intel APX (on SDE emulator). This bug triggers an assertion failure in methodHandles_x86.cpp because the assembler emits a short forward jump (imm8 displacement) whose target is more than 127 bytes away, exceeding the allowed range. This appears to be caused by larger stub code size when APX instruction encoding is enabled. The fix for this issue is to replace the `jccb` instruction with` jcc` in methodHandles_x86.cpp. ------------- Commit messages: - 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX Changes: https://git.openjdk.org/jdk/pull/26731/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26731&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365265 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26731.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26731/head:pull/26731 PR: https://git.openjdk.org/jdk/pull/26731 From jbhateja at openjdk.org Mon Aug 11 18:02:34 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 11 Aug 2025 18:02:34 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v4] In-Reply-To: References: Message-ID: <4omqHrPtNFE0UWmulPymwsUHXRpd9EBhgJvOpRyXxJQ=.dacad6cd-5a3e-4671-9543-98f04e1b7e73@github.com> > Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. > It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. > > Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). > > Vector API jtreg tests pass at AVX level 2, remaining validation in progress. > > Performance numbers: > > > System : 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms > VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms > VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms > VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms > VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms > VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24104/files - new: https://git.openjdk.org/jdk/pull/24104/files/e7c7374b..405de56f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=02-03 Stats: 389 lines in 9 files changed: 373 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/24104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104 PR: https://git.openjdk.org/jdk/pull/24104 From shade at openjdk.org Mon Aug 11 18:50:23 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 11 Aug 2025 18:50:23 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask [v2] In-Reply-To: References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: On Mon, 11 Aug 2025 16:22:56 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. >> >> In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. The major and obivous mistake I committed myself with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first `delete`-s, then asks for `next` (facepalms). >> >> Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. >> >> I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, reproducer no longer fails >> - [x] Linux AArch64 server fastdebug, `compiler` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Chicken out of memset-ing the possibly vtable-bearing object Thanks! GHA is clean. I am integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26696#issuecomment-3176353168 From shade at openjdk.org Mon Aug 11 18:53:20 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 11 Aug 2025 18:53:20 GMT Subject: Integrated: 8364501: Compiler shutdown crashes on access to deleted CompileTask In-Reply-To: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: On Fri, 8 Aug 2025 12:30:36 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. > > In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. The major and obivous mistake I committed myself with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first `delete`-s, then asks for `next` (facepalms). > > Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. > > I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. > > Additional testing: > - [x] Linux AArch64 server fastdebug, reproducer no longer fails > - [x] Linux AArch64 server fastdebug, `compiler` > - [x] Linux AArch64 server fastdebug, `all` This pull request has now been integrated. Changeset: 958383d6 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/958383d69c8742fdb78c28ad856559367c3513d7 Stats: 18 lines in 3 files changed: 10 ins; 0 del; 8 mod 8364501: Compiler shutdown crashes on access to deleted CompileTask Reviewed-by: kvn, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/26696 From aturbanov at openjdk.org Mon Aug 11 20:27:10 2025 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 11 Aug 2025 20:27:10 GMT Subject: RFR: 8365200: RISC-V: compiler/loopopts/superword/TestGeneralizedReductions.java fails with Zvbb and vlen=128 In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 02:06:02 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass with zvbb when vlen=256, but fail when vlen=128. > > The reason for the error is the same as [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694). 2-element reductions for INT/LONG are not profitable, so the compiler won't generate the corresponding reductions IR. > > This issue was not addressed together with [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694) because the testMapReductionOnGlobalAccumulator case where the error is reported has a different applyif method from other cases: zvbb needs to be enabled. > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=256 > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=128 test/hotspot/jtreg/compiler/loopopts/superword/TestGeneralizedReductions.java line 169: > 167: @IR(applyIfPlatform = {"riscv64", "true"}, > 168: applyIfCPUFeatureOr = {"zvbb", "true"}, > 169: applyIfAnd = {"SuperWordReductions", "true","UsePopCountInstruction", "true", "MaxVectorSize", ">=32"}, Suggestion: applyIfAnd = {"SuperWordReductions", "true", "UsePopCountInstruction", "true", "MaxVectorSize", ">=32"}, ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26719#discussion_r2267945882 From duke at openjdk.org Mon Aug 11 21:20:17 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 11 Aug 2025 21:20:17 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v41] In-Reply-To: References: Message-ID: <2boUUPGtT7e26_-3WdDG_NHIBzToo7exiRXvyhU27fg=.ae538a12-93ac-48c1-9979-0bc4cadf3a8c@github.com> > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality > > Additional Testing: > - [x] Linux x64 fastdebug tier 1/2/3/4 > - [x] Linux aarch64 fastdebug tier 1/2/3/4 Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Lock nmethod::relocate behind experimental flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/d4e3dd31..cc8d2862 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=40 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=39-40 Stats: 48 lines in 7 files changed: 27 ins; 0 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Mon Aug 11 22:05:27 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 11 Aug 2025 22:05:27 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v24] In-Reply-To: References: Message-ID: On Mon, 21 Jul 2025 15:42:32 GMT, Vladimir Kozlov wrote: >>> We?re hoping to get this into JDK 25, as it would simplify both development and backporting of features related to hot code grouping. That said, if the consensus is that JVMTI/JFR support is essential upfront, this can be delayed until JDK 26. >> >> I don't think this can be put into JDK 25. Too late and changes are not simple. And yes, JVMTI/JFR support is essential - you have to support all public functionalities of VM. > >> @vnkozlov When you get a chance, would you mind taking another look at this PR? > > @chadrako I promise to look soon but currently I am busy with Leyden before JVMLS. @vnkozlov I added the experimental flag to allow usage of nmethod::relocate like we discussed at JVMLS ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3177025017 From kvn at openjdk.org Mon Aug 11 23:02:24 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 11 Aug 2025 23:02:24 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v41] In-Reply-To: <2boUUPGtT7e26_-3WdDG_NHIBzToo7exiRXvyhU27fg=.ae538a12-93ac-48c1-9979-0bc4cadf3a8c@github.com> References: <2boUUPGtT7e26_-3WdDG_NHIBzToo7exiRXvyhU27fg=.ae538a12-93ac-48c1-9979-0bc4cadf3a8c@github.com> Message-ID: On Mon, 11 Aug 2025 21:20:17 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Lock nmethod::relocate behind experimental flag Good. Please, get approval from @fisk ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3177131936 From dzhang at openjdk.org Tue Aug 12 01:12:49 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 12 Aug 2025 01:12:49 GMT Subject: RFR: 8365200: RISC-V: compiler/loopopts/superword/TestGeneralizedReductions.java fails with Zvbb and vlen=128 In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 02:06:02 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass with zvbb when vlen=256, but fail when vlen=128. > > The reason for the error is the same as [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694). 2-element reductions for INT/LONG are not profitable, so the compiler won't generate the corresponding reductions IR. > > This issue was not addressed together with [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694) because the testMapReductionOnGlobalAccumulator case where the error is reported has a different applyif method from other cases: zvbb needs to be enabled. > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=256 > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=128 Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26719#issuecomment-3177363916 From dzhang at openjdk.org Tue Aug 12 01:12:49 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 12 Aug 2025 01:12:49 GMT Subject: RFR: 8365200: RISC-V: compiler/loopopts/superword/TestGeneralizedReductions.java fails with Zvbb and vlen=128 [v2] In-Reply-To: References: Message-ID: > Hi all, > Please take a look and review this PR, thanks! > > [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass with zvbb when vlen=256, but fail when vlen=128. > > The reason for the error is the same as [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694). 2-element reductions for INT/LONG are not profitable, so the compiler won't generate the corresponding reductions IR. > > This issue was not addressed together with [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694) because the testMapReductionOnGlobalAccumulator case where the error is reported has a different applyif method from other cases: zvbb needs to be enabled. > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=256 > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=128 Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Add missing whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26719/files - new: https://git.openjdk.org/jdk/pull/26719/files/cdae1d05..703205d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26719&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26719&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26719/head:pull/26719 PR: https://git.openjdk.org/jdk/pull/26719 From dzhang at openjdk.org Tue Aug 12 01:12:49 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 12 Aug 2025 01:12:49 GMT Subject: RFR: 8365200: RISC-V: compiler/loopopts/superword/TestGeneralizedReductions.java fails with Zvbb and vlen=128 [v2] In-Reply-To: References: Message-ID: <5-xbugEfSFiklggiImdpfeP9h8OwnrHHRocl7gq__Ts=.5f9210d6-ed26-462f-aa1c-5bbdbfeab6fb@github.com> On Mon, 11 Aug 2025 20:24:44 GMT, Andrey Turbanov wrote: >> Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing whitespace > > test/hotspot/jtreg/compiler/loopopts/superword/TestGeneralizedReductions.java line 169: > >> 167: @IR(applyIfPlatform = {"riscv64", "true"}, >> 168: applyIfCPUFeatureOr = {"zvbb", "true"}, >> 169: applyIfAnd = {"SuperWordReductions", "true","UsePopCountInstruction", "true", "MaxVectorSize", ">=32"}, > > Suggestion: > > applyIfAnd = {"SuperWordReductions", "true", "UsePopCountInstruction", "true", "MaxVectorSize", ">=32"}, Thanks for the review! Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26719#discussion_r2268332782 From fyang at openjdk.org Tue Aug 12 01:12:49 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Aug 2025 01:12:49 GMT Subject: RFR: 8365200: RISC-V: compiler/loopopts/superword/TestGeneralizedReductions.java fails with Zvbb and vlen=128 [v2] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 01:09:53 GMT, Dingli Zhang wrote: >> Hi all, >> Please take a look and review this PR, thanks! >> >> [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass with zvbb when vlen=256, but fail when vlen=128. >> >> The reason for the error is the same as [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694). 2-element reductions for INT/LONG are not profitable, so the compiler won't generate the corresponding reductions IR. >> >> This issue was not addressed together with [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694) because the testMapReductionOnGlobalAccumulator case where the error is reported has a different applyif method from other cases: zvbb needs to be enabled. >> >> ### Test (fastdebug) >> - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=256 >> - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=128 > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add missing whitespace Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26719#pullrequestreview-3108200157 From duke at openjdk.org Tue Aug 12 01:15:11 2025 From: duke at openjdk.org (duke) Date: Tue, 12 Aug 2025 01:15:11 GMT Subject: RFR: 8365200: RISC-V: compiler/loopopts/superword/TestGeneralizedReductions.java fails with Zvbb and vlen=128 In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 02:06:02 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass with zvbb when vlen=256, but fail when vlen=128. > > The reason for the error is the same as [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694). 2-element reductions for INT/LONG are not profitable, so the compiler won't generate the corresponding reductions IR. > > This issue was not addressed together with [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694) because the testMapReductionOnGlobalAccumulator case where the error is reported has a different applyif method from other cases: zvbb needs to be enabled. > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=256 > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=128 @DingliZhang Your change (at version 703205d3e68406a26093a3fc38da32cdb47e2936) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26719#issuecomment-3177368613 From duke at openjdk.org Tue Aug 12 01:17:27 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 12 Aug 2025 01:17:27 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v42] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality > > Additional Testing: > - [x] Linux x64 fastdebug tier 1/2/3/4 > - [x] Linux aarch64 fastdebug tier 1/2/3/4 Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Lock nmethod::relocate behind experimental flag - Use CompiledICLocker instead of CompiledIC_lock - Fix spacing - Update NMethod.java with immutable data changes - Rename method to nm - Add assert before freeing immutable data - Reorder is_relocatable checks - Require caller to hold locks - Revert is_always_within_branch_range changes - ... and 97 more: https://git.openjdk.org/jdk/compare/9593730a...24c35689 ------------- Changes: https://git.openjdk.org/jdk/pull/23573/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=41 Stats: 1664 lines in 28 files changed: 1597 ins; 2 del; 65 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From dzhang at openjdk.org Tue Aug 12 01:28:15 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 12 Aug 2025 01:28:15 GMT Subject: Integrated: 8365200: RISC-V: compiler/loopopts/superword/TestGeneralizedReductions.java fails with Zvbb and vlen=128 In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 02:06:02 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass with zvbb when vlen=256, but fail when vlen=128. > > The reason for the error is the same as [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694). 2-element reductions for INT/LONG are not profitable, so the compiler won't generate the corresponding reductions IR. > > This issue was not addressed together with [JDK-8357694](https://bugs.openjdk.org/browse/JDK-8357694) because the testMapReductionOnGlobalAccumulator case where the error is reported has a different applyif method from other cases: zvbb needs to be enabled. > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=256 > - [x] Run compiler/loopopts/superword/TestGeneralizedReductions.java on qemu-system w/ and w/o zvbb when vlen=128 This pull request has now been integrated. Changeset: 6927fc39 Author: Dingli Zhang Committer: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/6927fc3904eb239bd43ab7c581d479c00a6a4af2 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8365200: RISC-V: compiler/loopopts/superword/TestGeneralizedReductions.java fails with Zvbb and vlen=128 Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/26719 From fferrari at openjdk.org Tue Aug 12 03:19:22 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Tue, 12 Aug 2025 03:19:22 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v2] In-Reply-To: References: Message-ID: > Hi, this pull request is a second take of 1383fec41756322bf2832c55633e46395b937b40, by updating the `CmpUNode` type as either `TypeInt::CC_LE` (case 1a) or `TypeInt::CC_LT` (case 1b) instead of updating the `BoolNode` type as `TypeInt::ONE`. > > With this approach a56cd371a2c497e4323756f8b8a08a0bba059bf2 becomes unnecessary. Additionally, having the right type in `CmpUNode` could potentially enable further optimizations. > > #### Testing > > In order to evaluate the changes, the following testing has been performed: > > * `jdk:tier1` (see [GitHub Actions run](https://github.com/franferrax/jdk/actions/runs/16789994433)) > * [`TestBoolNodeGVN.java`](https://github.com/openjdk/jdk/blob/jdk-26+9/test/hotspot/jtreg/compiler/c2/gvn/TestBoolNodeGVN.java), created for [JDK-8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value](https://bugs.openjdk.org/browse/JDK-8327381) (1383fec41756322bf2832c55633e46395b937b40) > * I also checked it breaks if I remove the `CmpUNode::Value_cmpu_and_mask` call > * Private reproducer for [JDK-8349584: Improve compiler processing](https://bugs.openjdk.org/browse/JDK-8349584) (a56cd371a2c497e4323756f8b8a08a0bba059bf2) > * A local slowdebug run of the `test/hotspot/jtreg/compiler/c2` category on _Fedora Linux x86_64_ > * Same results as with `master` (f95af744b07a9ec87e2507b3d584cbcddc827bbd) Francisco Ferrari Bihurriet has updated the pull request incrementally with one additional commit since the last revision: Apply code review suggestions and add JBS to test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26666/files - new: https://git.openjdk.org/jdk/pull/26666/files/d073e80a..27ed1a31 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26666&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26666&range=00-01 Stats: 10 lines in 3 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/26666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26666/head:pull/26666 PR: https://git.openjdk.org/jdk/pull/26666 From fferrari at openjdk.org Tue Aug 12 03:19:23 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Tue, 12 Aug 2025 03:19:23 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v2] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 10:15:11 GMT, Galder Zamarre?o wrote: >> Francisco Ferrari Bihurriet has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply code review suggestions and add JBS to test > > Thanks for the PR @franferrax. Did you consider adding an IR test or similar that would expose the inconsistent state? Would it be feasible? Hi @galderz, I'm afraid I had to be a bit opaque because this was partially discussed in the VG. I was just referring to the fact that the `CmpUNode` type was being kept as `TypeInt::CC`, while we know more than that: * **1a.** `(x & m) <=u m` and `(m & x) <=u m` are always true, so `CmpU(x & m, m)` and `CmpU(m & x, m)` are known to be `TypeInt::CC_LE` * **1b.** `(x & m) References: Message-ID: On Mon, 11 Aug 2025 08:40:53 GMT, Christian Hagedorn wrote: >> Francisco Ferrari Bihurriet has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply code review suggestions and add JBS to test > > src/hotspot/share/opto/phaseX.cpp line 2941: > >> 2939: // Bool >> 2940: // >> 2941: void PhaseCCP::push_bool_with_cmpu_and_mask(Unique_Node_List& worklist, const Node* use) const { > > Needed to double-check but I think it's fine to remove the notification code since we already have `push_cmpu()` in place which looks through the `AddI`: > https://github.com/openjdk/jdk/blob/10762d408bba9ce0945100847a8674e7eb7fa75e/src/hotspot/share/opto/phaseX.cpp#L2911-L2926 > > So, whenever `m` or `1` changes, we will re-add the `CmpU` to the CCP worklist with `push_cmpu()`. The `x` does not matter for the application of `Value_cmpu_and_mask()`. Hmm, I was oversimplifying the problem, my way of thinking it was the following one: m x m 1 \ / \ / AndI AddI grandparents \ / CmpU parent | Bool grandchild _"As we were updating a grandchild based on its grandparents, we needed an ad-hoc worklist push for the grandchild. Since we now update the type of `CmpU` based on its parents, the canonical parent-to-child propagations should work, and we don't need any ad-hoc grandparents-to-grandchild worklist push anymore."_ But as you noted, non-immediate `CmpU` inputs such as `m` or `1` can change and should affect the `CmpU` type. Luckily, this already was the case for previous `CmpU` optimizations. --- For case **1a**, we don't need `PhaseCCP::push_cmpu` because `m` is also an immediate input of `CmpU`. m x \ / AndI m \ / CmpU | Bool --- I'm now realizing this was a very lucky situation. The `AndI` input isn't problematic even when `PhaseCCP::push_cmpu` doesn't handle the `use_op == Op_AndI` case, because: * `x` does not affect the application of `Value_cmpu_and_mask()` * In case **1a**, `m` is a direct input of `CmpU` * In case **1b**, the `AddI` input is handled in `PhaseCCP::push_cmpu` (`use_op == Op_AddI`) Please let me know if you think we should add a comment in the code. > src/hotspot/share/opto/subnode.cpp line 855: > >> 853: // (1a) and (1b) is covered by this method since we can directly return the corresponding TypeInt::CC_* >> 854: // while (2) is covered in BoolNode::Ideal since we create a new non-constant node (see [CMPU_MASK]). >> 855: const Type* CmpUNode::Value_cmpu_and_mask(PhaseValues* phase, const Node* in1, const Node* in2) { > > I suggest to directly name these: > `in1` -> `andI` > `in2`- > `rhs` > > Then it's easier to follow the comments. Great, I was going to use similar names and later regretted. Suggestion accepted in 27ed1a311ec34e24afae6cc43d2c71e2620eb0ef. > src/hotspot/share/opto/subnode.cpp line 1899: > >> 1897: // based on local information. If the input is constant, do it. >> 1898: const Type* BoolNode::Value(PhaseGVN* phase) const { >> 1899: return _test.cc2logical( phase->type( in(1) ) ); > > Suggestion: > > return _test.cc2logical(phase->type(in(1))); Suggestion accepted in 27ed1a311ec34e24afae6cc43d2c71e2620eb0ef. > src/hotspot/share/opto/subnode.hpp line 174: > >> 172: // Compare 2 unsigned values (integer or pointer), returning condition codes (-1, 0 or 1). >> 173: class CmpUNode : public CmpNode { >> 174: static const Type* Value_cmpu_and_mask(PhaseValues*, const Node*, const Node*); > > We usually add matching parameter names as found in the source file: > > static const Type* Value_cmpu_and_mask(PhaseValues* phase, const Node* in1, const Node* in2); > > or with the renaming above: > > static const Type* Value_cmpu_and_mask(PhaseValues* phase, const Node* andI, const Node* rhs); Suggestion accepted in 27ed1a311ec34e24afae6cc43d2c71e2620eb0ef. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26666#discussion_r2268462137 PR Review Comment: https://git.openjdk.org/jdk/pull/26666#discussion_r2268467836 PR Review Comment: https://git.openjdk.org/jdk/pull/26666#discussion_r2268468092 PR Review Comment: https://git.openjdk.org/jdk/pull/26666#discussion_r2268468327 From galder at openjdk.org Tue Aug 12 04:31:14 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 12 Aug 2025 04:31:14 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: <4mfHZiUcDJ3W0p2WCzgtUwp-FWSBX6eXOg1zLfcs_H0=.f0909507-d6fe-4d2a-a543-db6445e7f605@github.com> On Mon, 11 Aug 2025 14:11:37 GMT, Emanuel Peter wrote: >> Would you run specific tiers in those platforms? Just hotspot compiler? Or individual tests such as `ConvF2HFIdealizationTests` and `TestFloat16ScalarOperations`? > > Honestly, I don't know, I'd have to do the research myself. Probably focusing on the Float16 tests would be good enough. No other test would really use Float16, so running anything else would not be that useful probably. I've done some testing on x86_64 and aarch64 and the tests pass. I also made sure that the test output demonstrated execution of the expected IR rule as per the requirements of each platform. ## `c7gn.2xlarge` Graviton3 ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR SKIP jtreg:test/hotspot/jtreg/compiler/c2/irTests/ConvF2HFIdealizationTests.java 1 1 0 0 0 jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java 1 1 0 0 0 jtreg:test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java 1 1 0 0 0 ============================== TEST SUCCESS $ tail ConvF2HFIdealizationTests.jtr Messages from Test VM --------------------- [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in test1: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true ----------System.err:(3/35)---------- JavaTest Message: Test complete. result: Passed. Execution successful test result: Passed. Execution successful $ tail TestFloat16ScalarOperations.jtr Messages from Test VM --------------------- [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testDivByPOT: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMulByTWO: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testInexactFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testSNaNFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testQNaNFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testExactFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testRandomFP16ConstantPatternSet1: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testRandomFP16ConstantPatternSet2: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testRounding1: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testRounding2: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMax: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testAddConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testDivConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMin: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMinConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testEliminateIntermediateHF2S: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testDivByOne: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testFMAConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMaxConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMul: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testconvF2HFAndS2HF: Feature constraint not met (applyIfCPUFeature): avx512_fp16, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testDiv: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testSqrtConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testSqrt: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMulConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testFma: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testAdd1: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testAdd2: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testSubConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testSub: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true ----------System.err:(3/35)---------- JavaTest Message: Test complete. result: Passed. Execution successful test result: Passed. Execution successful ## `c7i.xlarge` Intel(R) Xeon(R) Platinum 8488C (saphire rapids): ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR SKIP jtreg:test/hotspot/jtreg/compiler/c2/irTests/ConvF2HFIdealizationTests.java 1 1 0 0 0 jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java 1 1 0 0 0 jtreg:test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java 1 1 0 0 0 ============================== TEST SUCCESS $ tail ConvF2HFIdealizationTests.jtr Messages from Test VM --------------------- [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in test1: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true ----------System.err:(3/35)---------- JavaTest Message: Test complete. result: Passed. Execution successful test result: Passed. Execution successful $ tail TestFloat16ScalarOperations.jtr Messages from Test VM --------------------- [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testDivByPOT: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMulByTWO: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testInexactFP16ConstantPatterns: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testSNaNFP16ConstantPatterns: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testQNaNFP16ConstantPatterns: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testExactFP16ConstantPatterns: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testRandomFP16ConstantPatternSet1: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testRandomFP16ConstantPatternSet2: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testRounding1: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testRounding2: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMax: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testAddConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testDivConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMin: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMinConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testEliminateIntermediateHF2S: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testDivByOne: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testFMAConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMaxConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMul: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testconvF2HFAndS2HF: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testDiv: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testSqrtConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testSqrt: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMulConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testFma: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testAdd1: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testAdd2: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testSubConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true [IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testSub: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true ----------System.err:(3/35)---------- JavaTest Message: Test complete. result: Passed. Execution successful test result: Passed. Execution successful ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2268546321 From galder at openjdk.org Tue Aug 12 04:36:13 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 12 Aug 2025 04:36:13 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: <4mfHZiUcDJ3W0p2WCzgtUwp-FWSBX6eXOg1zLfcs_H0=.f0909507-d6fe-4d2a-a543-db6445e7f605@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> <4mfHZiUcDJ3W0p2WCzgtUwp-FWSBX6eXOg1zLfcs_H0=.f0909507-d6fe-4d2a-a543-db6445e7f605@github.com> Message-ID: On Tue, 12 Aug 2025 04:28:45 GMT, Galder Zamarre?o wrote: >> Honestly, I don't know, I'd have to do the research myself. Probably focusing on the Float16 tests would be good enough. No other test would really use Float16, so running anything else would not be that useful probably. > > I've done some testing on x86_64 and aarch64 and the tests pass. > > I also made sure that the test output demonstrated execution of the expected IR rule as per the requirements of each platform. > > ## `c7gn.2xlarge` Graviton3 > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/hotspot/jtreg/compiler/c2/irTests/ConvF2HFIdealizationTests.java > 1 1 0 0 0 > jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java > 1 1 0 0 0 > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java > 1 1 0 0 0 > ============================== > TEST SUCCESS > > $ tail ConvF2HFIdealizationTests.jtr > Messages from Test VM > --------------------- > [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in test1: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true > > > ----------System.err:(3/35)---------- > > JavaTest Message: Test complete. > > result: Passed. Execution successful > > > test result: Passed. Execution successful > > $ tail TestFloat16ScalarOperations.jtr > Messages from Test VM > --------------------- > [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testDivByPOT: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true > [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMulByTWO: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true > [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testInexactFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true > [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testSNaNFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true > [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testQNaNFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true > [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testExactFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true > [IREncodingPrinter] Disabling IR matching for rule ... Btw, I've noticed that `TestFloat16ScalarOperations` does not have `package` definition. Is that an oversight? It runs fine in spite of not having it ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2268550794 From jbhateja at openjdk.org Tue Aug 12 06:01:29 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 12 Aug 2025 06:01:29 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v5] In-Reply-To: References: Message-ID: > Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. > It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. > > Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). > > Vector API jtreg tests pass at AVX level 2, remaining validation in progress. > > Performance numbers: > > > System : 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms > VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms > VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms > VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms > VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms > VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24104/files - new: https://git.openjdk.org/jdk/pull/24104/files/405de56f..f36ae6dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104 PR: https://git.openjdk.org/jdk/pull/24104 From jbhateja at openjdk.org Tue Aug 12 06:01:29 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 12 Aug 2025 06:01:29 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v2] In-Reply-To: <1Vs8Ud-yh7FtFJN9sddNXDVM6Mc0ue9oi_oa0w5pRzU=.022172f3-1622-4d05-888b-c7afc66a5254@github.com> References: <1Vs8Ud-yh7FtFJN9sddNXDVM6Mc0ue9oi_oa0w5pRzU=.022172f3-1622-4d05-888b-c7afc66a5254@github.com> Message-ID: On Mon, 11 Aug 2025 02:47:49 GMT, Xiaohong Gong wrote: > Q1: Is it possible that just passing `origin->get_con()` to `VectorSliceNode` in case there are architectures that need it directly? Or, maybe we'd better add comment telling that the origin passed to `VectorSliceNode` is adjust to bytes. > Added comments. > Q2: If `origin` is not a constant, and there is an architecture that support the index as a variable, will the code crash here? Can we just limit the `origin` to a constant for this intrinsifaction in this PR? We can consider to extend it to variable in case any architecture has such a requirement. WDYT? Currently, inline expander only supports constant origin. I have added a check to fail intrinsification and inline fallback using the hybrid call generator. > Do we have specific value for `origin` like zero or vlen? If so, maybe simply Identity is better to be added as well. Done, Thanks!, also added a new IR test to complement the code changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2268669566 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2268669731 From xgong at openjdk.org Tue Aug 12 06:01:29 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 12 Aug 2025 06:01:29 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v2] In-Reply-To: References: <1Vs8Ud-yh7FtFJN9sddNXDVM6Mc0ue9oi_oa0w5pRzU=.022172f3-1622-4d05-888b-c7afc66a5254@github.com> Message-ID: On Tue, 12 Aug 2025 05:55:14 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 1714: >> >>> 1712: } >>> 1713: >>> 1714: Node* origin_node = gvn().intcon(origin->get_con() * type2aelembytes(elem_bt)); >> >> Q1: Is it possible that just passing `origin->get_con()` to `VectorSliceNode` in case there are architectures that need it directly? Or, maybe we'd better add comment telling that the origin passed to `VectorSliceNode` is adjust to bytes. >> >> Q2: If `origin` is not a constant, and there is an architecture that support the index as a variable, will the code crash here? Can we just limit the `origin` to a constant for this intrinsifaction in this PR? We can consider to extend it to variable in case any architecture has such requirement. WDYT? > >> Q1: Is it possible that just passing `origin->get_con()` to `VectorSliceNode` in case there are architectures that need it directly? Or, maybe we'd better add comment telling that the origin passed to `VectorSliceNode` is adjust to bytes. >> > > Added comments. > >> Q2: If `origin` is not a constant, and there is an architecture that support the index as a variable, will the code crash here? Can we just limit the `origin` to a constant for this intrinsifaction in this PR? We can consider to extend it to variable in case any architecture has such a requirement. WDYT? > > Currently, inline expander only supports constant origin. I have added a check to fail intrinsification and inline fallback using the hybrid call generator. Thanks for your updating! So maybe the matcher function `supports_vector_slice_with_non_constant_index()` could also be removed totally? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2268675021 From shade at openjdk.org Tue Aug 12 06:26:11 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Aug 2025 06:26:11 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 17:38:28 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to address the failure caused by x86 forward jump offset exceeding imm8 displacement when running the HotSpot jtreg test `test/hotspot/jtreg/compiler/c2/TestLWLockingCodeGen.java` using Intel APX (on SDE emulator). > > This bug triggers an assertion failure in methodHandles_x86.cpp because the assembler emits a short forward jump (imm8 displacement) whose target is more than 127 bytes away, exceeding the allowed range. This appears to be caused by larger stub code size when APX instruction encoding is enabled. > > The fix for this issue is to replace the `jccb` instruction with` jcc` in methodHandles_x86.cpp. Looks good. This is diagnostics code, so performance is not a question. I think we generally avoid shortening branches over `__ STOP`, for example, which size is generally unpredictable. So this looks in alignment with that tactics. Maybe you want to unshorten the branch at L157 as well. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26731#pullrequestreview-3108735289 From dskantz at openjdk.org Tue Aug 12 06:32:13 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 12 Aug 2025 06:32:13 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v2] In-Reply-To: References: Message-ID: > This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). > > In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. > > The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. > > [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 > > [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 > > Testing: T1-4. > > Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: move check / tweak test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26685/files - new: https://git.openjdk.org/jdk/pull/26685/files/12992984..69596e61 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26685&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26685&range=00-01 Stats: 35 lines in 3 files changed: 18 ins; 9 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/26685.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26685/head:pull/26685 PR: https://git.openjdk.org/jdk/pull/26685 From dskantz at openjdk.org Tue Aug 12 06:43:13 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 12 Aug 2025 06:43:13 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v2] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 10:30:42 GMT, Galder Zamarre?o wrote: >> Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: >> >> move check / tweak test > > test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsMany.java line 28: > >> 26: * @bug 8357105 >> 27: * @summary Test that repeated stacked string concatenations do not >> 28: * consume too many compilation resources. > > Is there a reasonable way to enhance the test to validate excessive resources? I'm not sure if the following example would work, but I'm wondering if there is something that can be measured deterministically. E.g. before with the given test there would be ~N IR nodes produced but now it would be a max of ~M, assuming that M is deterministically smaller than N. There's a 80000 node limit by default and maybe the test could use a lower limit by specifying a value for the MaxNodeLimit flag. There is also the IR framework that can check for node counts for individual nodes. Without the fix, the test currently gets a MemLimit assert in debug runs for consuming 1GB of memory as it is building up the _arguments arrays. The high number of IR nodes is created later in `replace_string_concat` if we get that far without timing out or reaching the memory limit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2268756368 From dzhang at openjdk.org Tue Aug 12 06:45:55 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 12 Aug 2025 06:45:55 GMT Subject: RFR: 8365302: RISC-V: compiler/loopopts/superword/TestAlignVector.java fails when vlen=128 Message-ID: Hi all, Please take a look and review this PR, thanks! [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass when vlen=256, but fail when vlen=128. The error occurs because the test13aIL and test13bIL cases require ensuring that vectors are larger than what unrolling produces; otherwise, the corresponding vector IR will not be generated. We can use `JTREG="JAVA_OPTIONS=-XX:+TraceSuperWordLoopUnrollAnalysis"` during testing. The tips in the log: 76844 1333 b 4 compiler.loopopts.superword.TestAlignVector::test13aIL (42 bytes) slp analysis fails: unroll limit greater than max vector slp analysis: set max unroll to 4 Therefore, we need to limit MaxVectorSize to greater than or equal to 32 bytes. ### Test (fastdebug) - [x] Run compiler/loopopts/superword/TestAlignVector.java on qemu-system with RVV when vlen=128/256 ------------- Commit messages: - 8365302: RISC-V: compiler/loopopts/superword/TestAlignVector.java fails when vlen=128 Changes: https://git.openjdk.org/jdk/pull/26738/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26738&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365302 Stats: 18 lines in 1 file changed: 16 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26738.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26738/head:pull/26738 PR: https://git.openjdk.org/jdk/pull/26738 From thartmann at openjdk.org Tue Aug 12 07:30:14 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 12 Aug 2025 07:30:14 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v7] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: On Mon, 11 Aug 2025 11:17:31 GMT, Saranya Natarajan wrote: >> **Issue** >> Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. >> >> **Analysis** >> On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. >> >> **Proposal** >> Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. >> >> **Issue in AArch64** >> Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? >> >> **Request to port maintainers** >> @dafedafe suggested that we keep the upper boun... > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review : adding vm.debug and moving a defn Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26139#pullrequestreview-3108965837 From fyang at openjdk.org Tue Aug 12 07:54:12 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Aug 2025 07:54:12 GMT Subject: RFR: 8365302: RISC-V: compiler/loopopts/superword/TestAlignVector.java fails when vlen=128 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 06:38:33 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass when vlen=256, but fail when vlen=128. > > The error occurs because the test13aIL and test13bIL cases require ensuring that vectors are larger than what unrolling produces; otherwise, the corresponding vector IR will not be generated. > > We can use `JTREG="JAVA_OPTIONS=-XX:+TraceSuperWordLoopUnrollAnalysis"` during testing. > The tips in the log: > > 76844 1333 b 4 compiler.loopopts.superword.TestAlignVector::test13aIL (42 bytes) > slp analysis fails: unroll limit greater than max vector > > slp analysis: set max unroll to 4 > > > Therefore, we need to limit MaxVectorSize to greater than or equal to 32 bytes. > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/TestAlignVector.java on qemu-system with RVV when vlen=128/256 LGTM. Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26738#pullrequestreview-3109088468 From aph at openjdk.org Tue Aug 12 08:06:11 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 12 Aug 2025 08:06:11 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 06:23:56 GMT, Aleksey Shipilev wrote: > Looks good. This is diagnostics code, so performance is not a question. > > I think we generally avoid shortening branches over `__ STOP`, for example, which size is generally unpredictable. So this looks in alignment with that tactics. Maybe you want to unshorten the branch at L157 as well. All thi.s long-and-short branch stuff is a pain. I wonder, given that we're now saving stubs in an archive, whether we should just bite the bullet and implement branch relaxation for stubs. I don't think it would be very hard. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26731#issuecomment-3178187948 From fyang at openjdk.org Tue Aug 12 08:16:13 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Aug 2025 08:16:13 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v6] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: On Mon, 11 Aug 2025 11:17:13 GMT, Saranya Natarajan wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing review - testcase and max range is set to 1000 > > In the process of adding port maintainers to this PR, I by mistake added (and removed) some of them as contributor. I will update the contributor list before closing the PR. Sorry for the inconvenience @sarannat : Hi, Thanks for the ping! I just tried the newly-added test on linux-riscv64 and I think we still need some extra change for this platform. Do you mind adding that in this PR? I see the test pass with this addon change when running with fastdebug build. [riscv-addon-fix.diff.txt](https://github.com/user-attachments/files/21729910/riscv-addon-fix.diff.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3178215604 From bkilambi at openjdk.org Tue Aug 12 08:25:12 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 12 Aug 2025 08:25:12 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> <4mfHZiUcDJ3W0p2WCzgtUwp-FWSBX6eXOg1zLfcs_H0=.f0909507-d6fe-4d2a-a543-db6445e7f605@github.com> Message-ID: On Tue, 12 Aug 2025 04:33:04 GMT, Galder Zamarre?o wrote: >> I've done some testing on x86_64 and aarch64 and the tests pass. >> >> I also made sure that the test output demonstrated execution of the expected IR rule as per the requirements of each platform. >> >> ## `c7gn.2xlarge` Graviton3 >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR SKIP >> jtreg:test/hotspot/jtreg/compiler/c2/irTests/ConvF2HFIdealizationTests.java >> 1 1 0 0 0 >> jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java >> 1 1 0 0 0 >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java >> 1 1 0 0 0 >> ============================== >> TEST SUCCESS >> >> $ tail ConvF2HFIdealizationTests.jtr >> Messages from Test VM >> --------------------- >> [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in test1: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true >> >> >> ----------System.err:(3/35)---------- >> >> JavaTest Message: Test complete. >> >> result: Passed. Execution successful >> >> >> test result: Passed. Execution successful >> >> $ tail TestFloat16ScalarOperations.jtr >> Messages from Test VM >> --------------------- >> [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testDivByPOT: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true >> [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMulByTWO: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true >> [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testInexactFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true >> [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testSNaNFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true >> [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testQNaNFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true >> [IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testExactFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeat... > > Btw, I've noticed that `TestFloat16ScalarOperations` does not have `package` definition. Is that an oversight? It runs fine in spite of not having it Hi, as you mostly touched the auto-vectorization part of c2, could you please run these float16 tests as well (most of these enable auto-vectorization for Float16) - `compiler/vectorization/TestFloat16VectorOperations.java` `compiler/vectorization/TestFloatConversionsVectorNaN.java` `compiler/vectorization/TestFloatConversionsVector.java` `compiler/vectorization/TestFloat16ToFloatConv.java` `compiler/vectorization/TestFloat16VectorConvChain.java` `compiler/intrinsics/float16/*` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2269084605 From shade at openjdk.org Tue Aug 12 08:36:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Aug 2025 08:36:12 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX In-Reply-To: References: Message-ID: <6KBUzFUMEtIKXUhDGaNYEGtXmnSe7Ohu6ZTTmuH07NI=.e3d665d4-73d8-41c5-95a2-5e1e284eeb3a@github.com> On Tue, 12 Aug 2025 08:03:15 GMT, Andrew Haley wrote: > > Looks good. This is diagnostics code, so performance is not a question. > > I think we generally avoid shortening branches over `__ STOP`, for example, which size is generally unpredictable. So this looks in alignment with that tactics. Maybe you want to unshorten the branch at L157 as well. > > All thi.s long-and-short branch stuff is a pain. I wonder, given that we're now saving stubs in an archive, whether we should just bite the bullet and implement branch relaxation for stubs. I don't think it would be very hard. Code density still matters for runtime performance, alas. I think a practical guidance is to avoid optimizing for code density in diagnostic code, like this one. Accepting pain for production-side code is a fair trade, accepting pain for diagnostic code is just silly. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26731#issuecomment-3178303992 From fbredberg at openjdk.org Tue Aug 12 08:45:12 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 12 Aug 2025 08:45:12 GMT Subject: RFR: 8364141: Remove LockingMode related code from x86 [v4] In-Reply-To: References: Message-ID: On Thu, 7 Aug 2025 09:23:37 GMT, Fredrik Bredberg wrote: >> Since the integration of [JDK-8359437](https://bugs.openjdk.org/browse/JDK-8359437) the `LockingMode` flag can no longer be set by the user, instead it's declared as `const int LockingMode = LM_LIGHTWEIGHT;`. This means that we can now safely remove all `LockingMode` related code from all platforms. >> >> This PR removes `LockingMode` related code from the **x86** platform. >> >> When all the `LockingMode` code has been removed from all platforms, we can go on and remove it from shared (non-platform specific) files as well. And finally remove the `LockingMode` variable itself. >> >> Passes tier1-tier5 with no added problems. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update three after review Thanks everyone for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26552#issuecomment-3178335696 From fbredberg at openjdk.org Tue Aug 12 08:48:22 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 12 Aug 2025 08:48:22 GMT Subject: Integrated: 8364141: Remove LockingMode related code from x86 In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 13:17:37 GMT, Fredrik Bredberg wrote: > Since the integration of [JDK-8359437](https://bugs.openjdk.org/browse/JDK-8359437) the `LockingMode` flag can no longer be set by the user, instead it's declared as `const int LockingMode = LM_LIGHTWEIGHT;`. This means that we can now safely remove all `LockingMode` related code from all platforms. > > This PR removes `LockingMode` related code from the **x86** platform. > > When all the `LockingMode` code has been removed from all platforms, we can go on and remove it from shared (non-platform specific) files as well. And finally remove the `LockingMode` variable itself. > > Passes tier1-tier5 with no added problems. This pull request has now been integrated. Changeset: f155f7d6 Author: Fredrik Bredberg URL: https://git.openjdk.org/jdk/commit/f155f7d6e50c702f65858774cfd02ef60aa9cad5 Stats: 639 lines in 10 files changed: 37 ins; 546 del; 56 mod 8364141: Remove LockingMode related code from x86 Reviewed-by: aboldtch, dholmes, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/26552 From snatarajan at openjdk.org Tue Aug 12 08:56:02 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 12 Aug 2025 08:56:02 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled [v2] In-Reply-To: References: Message-ID: > **Issue** > An error, `assert(data->is_ReceiverTypeData()) failed: bad profile data type`, is encountered during C2 compilation due to bad profile data. This occurs when the code is compiled with `TypeProfileCasts` option disabled. > > **Analysis** > The assertion failure occurs in `record_profiled_receiver_for_speculation` that analyzes the profiling information in the method data to determine whether a null value has been observed in the `instanceof` operation. This information is encoded in the `BitData` during profiling. When the method identifies that a null has been seen, it proceeds to inspect the associated `ReceiverTypeData` to see if the type check is always performed against null. However, in this scenario, the incoming profiling data is of type `BitData` rather than `ReceiverTypeData`, leading to the assertion failure. > > The profiling information for null seen for operations `aastore`, `instanceof`, and `checkcast` is recorded by the method `profile_null_seen `(in` src/hotspot/cpu/x86/templateTable_x86.cpp `). On investigating this method, it can be observed that the method data pointer is not updated for `VirtualCallData` (which is a subclass of `ReceiverTypeData`) when the `TypeProfileCasts` option is disabled. > > **Solution** > My proposal is to inspect the `ReceiverTypeData` in function `record_profiled_receiver_for_speculation` only if `TypeProfileCasts` is enabled (this is based on the fact that the relevant method data pointer is not updated when `TypeProfileCasts` is disabled). > > **Question to reviewers** > Do you think this is a reasonable fix ? > > **Testing** > GitHub Actions > tier1 to tier3 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: adding test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26640/files - new: https://git.openjdk.org/jdk/pull/26640/files/0089ba25..767f4e87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26640&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26640&range=00-01 Stats: 48 lines in 1 file changed: 48 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26640.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26640/head:pull/26640 PR: https://git.openjdk.org/jdk/pull/26640 From snatarajan at openjdk.org Tue Aug 12 09:04:12 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 12 Aug 2025 09:04:12 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled [v2] In-Reply-To: References: Message-ID: <-Tt0TlxRWfQpBVSzDN-qkFZLy1ic1jalf31p25087Sw=.97b8069d-a082-44df-b43e-a6bcd55edb56@github.com> On Thu, 7 Aug 2025 11:38:30 GMT, Manuel H?ssig wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> adding test > > Thank you for working on this, @sarannat! The fix seems reasonable to me since `GraphKit::maybe_cast_profiled_receiver` has a similar exception. > > However, you are missing a regression test or a `noreg-*` label in JBS. However, in this case, I think a small regression test is warranted. @mhaessig and @dafedafe : Thank you for the review. I have now added a test case. @dafedafe : I have tested -XX:-TypeProfileCasts with few examples.I want to highlight the example of the test in `/test/hotspot/jtreg/compiler/tiered/TypeProfileCasts.java`. When I ran this with -XX:-TypeProfileCasts option and TieredCompilation disabled, it did crash with the same error. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26640#issuecomment-3178400547 From shade at openjdk.org Tue Aug 12 09:04:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Aug 2025 09:04:13 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 07:54:53 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments and modified some comments Can we/should we plug this problem in encoding first, without going too much into the optimizing the non-broken case? As it stands now, real FP16-using code can run into matcher errors in JDK 25. I would like to fix that first. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3178400986 From duke at openjdk.org Tue Aug 12 09:10:01 2025 From: duke at openjdk.org (erifan) Date: Tue, 12 Aug 2025 09:10:01 GMT Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI expand operation Message-ID: Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified: 1. **Subword types** on SVE2-capable hardware. 2. **All types** on NEON and SVE1 environments. As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments. Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example: To compute: dst = src.expand(mask) Data direction: high <== low Input: src = p o n m l k j i h g f e d c b a mask = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 Expected result: dst = 0 0 h g 0 0 f e 0 0 d c 0 0 b a Step 1: calculate the index input of the TBL instruction. // Set tmp1 as all 0 vector. tmp1 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 // Move the mask bits from the predicate register to a vector register. // **1-bit** mask lane of P register to **8-bit** mask lane of V register. tmp2 = mask = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 // Shift the entire register. Prefix sum algorithm. dst = tmp2 << 8 = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 tmp2 += dst = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1 dst = tmp2 << 16 = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0 tmp2 += dst = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 dst = tmp2 << 32 = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0 tmp2 += dst = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1 dst = tmp2 << 64 = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0 tmp2 += dst = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1 // Clear inactive elements. dst = sel(mask, tmp2, tmp1) = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1 // Set the inactive lane value to -1 and set the active lane to the target index. dst -= 1 = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0 Step 2: shuffle the source vector elements to the target vector tbl(dst, src, dst) = 0 0 h g 0 0 f e 0 0 d c 0 0 b a The same algorithm is used for NEON and SVE1, but with different instructions where appropriate. The following benchmarks are from panama-vector/vectorIntrinsics. On Nvidia Grace machine with option `-XX:UseSVE=2`: Benchmark Unit Before Score Error After Score Error Uplift Byte128Vector.expand ops/ms 1791.022366 5.619883 9633.388683 1.968788 5.37 Double128Vector.expand ops/ms 4489.255846 0.48485 4488.772949 0.491596 0.99 Float128Vector.expand ops/ms 8863.02424 6.888087 8908.352235 51.487453 1 Int128Vector.expand ops/ms 8873.485683 3.275682 8879.635643 1.243863 1 Long128Vector.expand ops/ms 4485.1149 4.458073 4489.365269 0.851093 1 Short128Vector.expand ops/ms 792.068834 2.640398 5880.811288 6.40683 7.42 Byte64Vector.expand ops/ms 854.455002 8.548982 5999.046295 37.209987 7.02 Double64Vector.expand ops/ms 46.49763 0.104773 46.526043 0.102451 1 Float64Vector.expand ops/ms 4510.596811 0.504477 4509.984244 1.519178 0.99 Int64Vector.expand ops/ms 4508.778322 1.664461 4535.216611 26.742484 1 Long64Vector.expand ops/ms 45.665462 0.705485 46.496232 0.075648 1.01 Short64Vector.expand ops/ms 394.527324 1.284691 3860.199621 0.720015 9.78 On Nvidia Grace machine with option `-XX:UseSVE=1`: Benchmark Unit Before Score Error After Score Error Uplift Byte128Vector.expand ops/ms 1767.314171 12.431526 9630.892248 1.478813 5.44 Double128Vector.expand ops/ms 197.614381 0.945541 2416.075281 2.664325 12.22 Float128Vector.expand ops/ms 390.878183 2.089234 3844.011978 3.792751 9.83 Int128Vector.expand ops/ms 394.550044 2.025371 3843.280133 3.528017 9.74 Long128Vector.expand ops/ms 198.366863 0.651726 2423.234639 4.911434 12.21 Short128Vector.expand ops/ms 790.044704 3.339363 5885.595035 1.440598 7.44 Byte64Vector.expand ops/ms 853.479119 7.158898 5942.750116 1.054905 6.96 Double64Vector.expand ops/ms 46.550458 0.079191 46.423053 0.057554 0.99 Float64Vector.expand ops/ms 197.977215 1.156535 2445.010767 1.992358 12.34 Int64Vector.expand ops/ms 198.326857 1.02785 2444.211583 2.5432 12.32 Long64Vector.expand ops/ms 46.526513 0.25779 45.984253 0.566691 0.98 Short64Vector.expand ops/ms 398.649412 1.87764 3837.495773 3.528926 9.62 On Nvidia Grace machine with option `-XX:UseSVE=0`: Benchmark Unit Before Score Error After Score Error Uplift Byte128Vector.expand ops/ms 1802.98702 6.906394 9427.491602 2.067934 5.22 Double128Vector.expand ops/ms 198.498191 0.429071 1190.476326 0.247358 5.99 Float128Vector.expand ops/ms 392.849005 2.034676 2373.195574 2.006566 6.04 Int128Vector.expand ops/ms 395.69179 2.194773 2372.084745 2.058303 5.99 Long128Vector.expand ops/ms 198.191673 1.476362 1189.712301 1.006821 6 Short128Vector.expand ops/ms 795.785831 5.62611 4731.514053 2.365213 5.94 Byte64Vector.expand ops/ms 843.549268 7.174254 5865.556155 37.639415 6.95 Double64Vector.expand ops/ms 45.943599 0.484743 46.529755 0.111551 1.01 Float64Vector.expand ops/ms 193.945993 0.943338 1463.836772 0.618393 7.54 Int64Vector.expand ops/ms 194.168021 0.492286 1473.004575 8.802656 7.58 Long64Vector.expand ops/ms 46.570488 0.076372 46.696353 0.078649 1 Short64Vector.expand ops/ms 387.973334 2.367312 2920.428114 0.863635 7.52 Some JTReg test cases are added for the above changes. And the patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. ------------- Commit messages: - 8363989: AArch64: Add missing backend support of VectorAPI expand operation Changes: https://git.openjdk.org/jdk/pull/26740/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26740&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8363989 Stats: 482 lines in 9 files changed: 386 ins; 12 del; 84 mod Patch: https://git.openjdk.org/jdk/pull/26740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26740/head:pull/26740 PR: https://git.openjdk.org/jdk/pull/26740 From bkilambi at openjdk.org Tue Aug 12 09:16:15 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 12 Aug 2025 09:16:15 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 12:09:18 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments and modified some comments > > For `loadConH`, LLVM and GCC use > > mov wscratch, #const > dup v0.4h, wscratch > > We should investigate that. > > As far as I can see, LLVM and GCC do this for all vector immediates that don't need more than 2 movz/movk instructions. HI @theRealAph Thanks a lot for your comment. I feel it is a good idea to modify `loadConH` to move a constant instead of doing an `ldr` from the constant pool (it could probably get us some performance benefit as well). However, the scope of this ticket was to mainly fix the JTREG errors that >16B SVE machines were running into due to illegal immediates being passed to the `sve_dup` instruction. Would it be acceptable if I push this fix first and then create a follow up task to work on optimizing `loadConH`? I can create a new JBS ticket and assign it to myself and tag it here as well if that helps. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3178458878 From rcastanedalo at openjdk.org Tue Aug 12 09:18:17 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Aug 2025 09:18:17 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v2] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 06:32:13 GMT, Daniel Skantz wrote: >> This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). >> >> In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. >> >> The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. >> >> [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 >> >> [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 >> >> Testing: T1-4. >> >> Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > move check / tweak test I did a quick study of C2 compile time vs. `arguments_appended` by running `TestStackedConcatsMany` with different number of `s = new StringBuilder().append(s).append(s).toString();` lines, and got the following results in my machine: image The plot suggests to me that the current limit of 100 could be relaxed without risking a resource usage explosion. 256 seems to me like a better balance in terms of enabling the optimization in more scenarios while still being far away from the explosion point. I also have a fewer follow-up comments and requests, mostly about style. src/hotspot/share/opto/stringopts.cpp line 299: > 297: assert(result->_control.contains(_begin), "what?"); > 298: > 299: const int concat_argument_upper_bound = 100; Thanks for lifting this to a named constant. For consistency with [the style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#naming), I think it would be better to use upper case or mixed case. Consider also making this constant a static member of `StringConcat`. Finally, you could also declare both the constant and arguments_appended as `uint`. test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsMany.java line 30: > 28: * consume too many compilation resources. > 29: * @requires vm.compiler2.enabled > 30: * @run main/othervm -XX:-OptoScheduling compiler.stringopts.TestStackedConcatsMany Please add a comment explaining why `-XX:-OptoScheduling` is used. test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsMany.java line 42: > 40: public static void main (String... args) { > 41: new StringBuilder(); // Trigger loading of the StringBuilder class. > 42: String s = f(); // warmup call Why do you need this warm-up call? Isn't it enough with calling `f` once? test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsMany.java line 50: > 48: if (!(s.equals(z))) { > 49: throw new RuntimeException("wrong result."); > 50: } I think using `jdk.test.lib.Asserts.assertEQ` is preferable here. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26685#pullrequestreview-3109179654 PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2269065810 PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2269067300 PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2269072168 PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2269078568 From mhaessig at openjdk.org Tue Aug 12 09:34:37 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 09:34:37 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: References: Message-ID: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> > A loop of the form > > MemorySegment ms = {}; > for (long i = 0; i < ms.byteSize() / 8L; i++) { > // vectorizable work > } > > does not vectorize, whereas > > MemorySegment ms = {}; > long size = ms.byteSize(); > for (long i = 0; i < size / 8L; i++) { > // vectorizable work > } > > vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: > > https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 > > Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. > > So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. > > @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem > > ## Change Description > > Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. > >
Explored Alternatives > 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. > 2. Insert a "`PHASEIDEALLOOP0`" with `LoopOptsNone` that only perfor... Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: - Improve comment - Fix build failure on product ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26429/files - new: https://git.openjdk.org/jdk/pull/26429/files/0c200787..e978ab7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=01-02 Stats: 4 lines in 2 files changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26429.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26429/head:pull/26429 PR: https://git.openjdk.org/jdk/pull/26429 From mhaessig at openjdk.org Tue Aug 12 09:34:37 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 09:34:37 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v2] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 15:43:05 GMT, Quan Anh Mai wrote: >> Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: >> >> - Fix debug print >> - Test more flags >> - Renaming and comments > > src/hotspot/share/opto/loopnode.hpp line 1639: > >> 1637: // Sum of all wins regardless of where they happen. >> 1638: int _total_wins; >> 1639: // Number of wins on a loop entry edge, which only pays dividens once per loop execution. > > You should specify that "If the split is through a loop head", otherwise `0`. Also, typo `dividends` I tried my hand at an improvement. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2269266215 From mhaessig at openjdk.org Tue Aug 12 09:37:10 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 09:37:10 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled [v2] In-Reply-To: References: Message-ID: <7BRWHyYaTAEbv7Yery2pRVrzCQfKB0sBFIh4M4xsCN8=.c0b2463a-4202-428d-bd3e-fa082cbbbf46@github.com> On Tue, 12 Aug 2025 08:56:02 GMT, Saranya Natarajan wrote: >> **Issue** >> An error, `assert(data->is_ReceiverTypeData()) failed: bad profile data type`, is encountered during C2 compilation due to bad profile data. This occurs when the code is compiled with `TypeProfileCasts` option disabled. >> >> **Analysis** >> The assertion failure occurs in `record_profiled_receiver_for_speculation` that analyzes the profiling information in the method data to determine whether a null value has been observed in the `instanceof` operation. This information is encoded in the `BitData` during profiling. When the method identifies that a null has been seen, it proceeds to inspect the associated `ReceiverTypeData` to see if the type check is always performed against null. However, in this scenario, the incoming profiling data is of type `BitData` rather than `ReceiverTypeData`, leading to the assertion failure. >> >> The profiling information for null seen for operations `aastore`, `instanceof`, and `checkcast` is recorded by the method `profile_null_seen `(in` src/hotspot/cpu/x86/templateTable_x86.cpp `). On investigating this method, it can be observed that the method data pointer is not updated for `VirtualCallData` (which is a subclass of `ReceiverTypeData`) when the `TypeProfileCasts` option is disabled. >> >> **Solution** >> My proposal is to inspect the `ReceiverTypeData` in function `record_profiled_receiver_for_speculation` only if `TypeProfileCasts` is enabled (this is based on the fact that the relevant method data pointer is not updated when `TypeProfileCasts` is disabled). >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? >> >> **Testing** >> GitHub Actions >> tier1 to tier3 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > adding test Thank you for adding the test. Apart from my question, this looks good to me. test/hotspot/jtreg/compiler/arguments/TestProfileCasts.java line 43: > 41: > 42: public static void main(String[] args) { > 43: for (int i = 0; i < 100_000; i++) { Do you really need 100'000 iterations to get it to compile or can you reduce it a bit? ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26640#pullrequestreview-3109559187 PR Review Comment: https://git.openjdk.org/jdk/pull/26640#discussion_r2269272446 From aph at openjdk.org Tue Aug 12 09:54:12 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 12 Aug 2025 09:54:12 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 07:54:53 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments and modified some comments Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26589#pullrequestreview-3109667013 From aph at openjdk.org Tue Aug 12 09:54:13 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 12 Aug 2025 09:54:13 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 12:09:18 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments and modified some comments > > For `loadConH`, LLVM and GCC use > > mov wscratch, #const > dup v0.4h, wscratch > > We should investigate that. > > As far as I can see, LLVM and GCC do this for all vector immediates that don't need more than 2 movz/movk instructions. > HI @theRealAph Thanks a lot for your comment. I feel it is a good idea to modify `loadConH` to move a constant instead of doing an `ldr` from the constant pool (it could probably get us some performance benefit as well). However, the scope of this ticket was to mainly fix the JTREG errors that >16B SVE machines were running into due to illegal immediates being passed to the `sve_dup` instruction. Would it be acceptable if I push this fix first and then create a follow up task to work on optimizing `loadConH`? I can create a new JBS ticket and assign it to myself and tag it here as well if that helps. Thank you! Well, yes, but I'm proposing a simpler and better fix to the problem. Sure, if you want to do this in two steps go ahead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3178620202 From aph at openjdk.org Tue Aug 12 10:09:10 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 12 Aug 2025 10:09:10 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX In-Reply-To: <6KBUzFUMEtIKXUhDGaNYEGtXmnSe7Ohu6ZTTmuH07NI=.e3d665d4-73d8-41c5-95a2-5e1e284eeb3a@github.com> References: <6KBUzFUMEtIKXUhDGaNYEGtXmnSe7Ohu6ZTTmuH07NI=.e3d665d4-73d8-41c5-95a2-5e1e284eeb3a@github.com> Message-ID: On Tue, 12 Aug 2025 08:33:13 GMT, Aleksey Shipilev wrote: > > > Looks good. This is diagnostics code, so performance is not a question. > > > I think we generally avoid shortening branches over `__ STOP`, for example, which size is generally unpredictable. So this looks in alignment with that tactics. Maybe you want to unshorten the branch at L157 as well. > > > > > > All thi.s long-and-short branch stuff is a pain. I wonder, given that we're now saving stubs in an archive, whether we should just bite the bullet and implement branch relaxation for stubs. I don't think it would be very hard. > > Code density still matters for runtime performance, alas. Well, yes. I'm suggesting that we should generate short branches automagically. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26731#issuecomment-3178676758 From shade at openjdk.org Tue Aug 12 10:22:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Aug 2025 10:22:13 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX In-Reply-To: References: <6KBUzFUMEtIKXUhDGaNYEGtXmnSe7Ohu6ZTTmuH07NI=.e3d665d4-73d8-41c5-95a2-5e1e284eeb3a@github.com> Message-ID: <4fpY7gp5xhFqF3w8dL48-FlfoEeY7wPg997BU7Ka0Gc=.84d1174e-ea12-434a-ad75-d7acf61c2a5b@github.com> On Tue, 12 Aug 2025 10:06:58 GMT, Andrew Haley wrote: > Well, yes. I'm suggesting that we should generate short branches automagically. We do generate short branches auto-magically, but only for back-branches, where we know where the target is at the time we emit the jump. So _forward jumps_ get the short (pun intended) end of the stick. I thought about this a bit a few years back: I can imagine how could one do multiple scratch emits that try to progressively figure out which forward jumps can be shortened. That would need to be iterative, because shortening an _inner_ jump likely opens opportunities for shortening more _outer_ jumps. So this opens a question how this all impacts compilation time. I guess it is not prohibitive for small code blobs like stubs. But then, going through all this hassle to only optimize stubs? We might as well spend this time hand-optimizing the forward jumps by hand :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26731#issuecomment-3178717699 From qamai at openjdk.org Tue Aug 12 10:26:15 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 12 Aug 2025 10:26:15 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> Message-ID: On Tue, 12 Aug 2025 09:34:37 GMT, Manuel H?ssig wrote: >> A loop of the form >> >> MemorySegment ms = {}; >> for (long i = 0; i < ms.byteSize() / 8L; i++) { >> // vectorizable work >> } >> >> does not vectorize, whereas >> >> MemorySegment ms = {}; >> long size = ms.byteSize(); >> for (long i = 0; i < size / 8L; i++) { >> // vectorizable work >> } >> >> vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: >> >> https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 >> >> Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. >> >> So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. >> >> @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem >> >> ## Change Description >> >> Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. >> >>
Explored Alternatives >> 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. >> 2. I... > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - Improve comment > - Fix build failure on product Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26429#pullrequestreview-3109790938 From bkilambi at openjdk.org Tue Aug 12 10:28:18 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 12 Aug 2025 10:28:18 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 12:09:18 GMT, Andrew Haley wrote: > For `loadConH`, LLVM and GCC use > > ``` > mov wscratch, #const > dup v0.4h, wscratch > ``` > > We should investigate that. > > As far as I can see, LLVM and GCC do this for all vector immediates that don't need more than 2 movz/movk instructions. Just a quick look at what can be done to improve the codegen - The code shown above is Neon and this is similar to what we generate in hotspot as well (for Neon) - ``` 0x0000e8a5e482ea80: mov w8, #0x40b // #1035 0x0000e8a5e482ea84: dup v18.8h, w8 Only on SVE machines with >16B vectors, we load from the constant table - ``` 0x0000e9cdc902e370: ldr s16, 0x0000e9cdc902e280 ; {section_word} 0x0000e9cdc902e440: mov z17.h, p7/m, h16 We could generate a similar `mov` and a `dup` from a GPR (instead of immediate) even in the SVE case (a good part of my patch + JTREG test *could* be redundant). I'll update the patch as soon as I can. Thanks a lot for your suggestion ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3178735447 From adinn at openjdk.org Tue Aug 12 10:39:12 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 12 Aug 2025 10:39:12 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX In-Reply-To: <4fpY7gp5xhFqF3w8dL48-FlfoEeY7wPg997BU7Ka0Gc=.84d1174e-ea12-434a-ad75-d7acf61c2a5b@github.com> References: <6KBUzFUMEtIKXUhDGaNYEGtXmnSe7Ohu6ZTTmuH07NI=.e3d665d4-73d8-41c5-95a2-5e1e284eeb3a@github.com> <4fpY7gp5xhFqF3w8dL48-FlfoEeY7wPg997BU7Ka0Gc=.84d1174e-ea12-434a-ad75-d7acf61c2a5b@github.com> Message-ID: <6KweSq7J4SJyDII9_3SMu9VmM6DTdRRKT8EzgobCfFA=.4ca9fb84-5367-45d1-a40d-81509f460af0@github.com> On Tue, 12 Aug 2025 10:19:54 GMT, Aleksey Shipilev wrote: >>> > > Looks good. This is diagnostics code, so performance is not a question. >>> > > I think we generally avoid shortening branches over `__ STOP`, for example, which size is generally unpredictable. So this looks in alignment with that tactics. Maybe you want to unshorten the branch at L157 as well. >>> > >>> > >>> > All thi.s long-and-short branch stuff is a pain. I wonder, given that we're now saving stubs in an archive, whether we should just bite the bullet and implement branch relaxation for stubs. I don't think it would be very hard. >>> >>> Code density still matters for runtime performance, alas. >> >> Well, yes. I'm suggesting that we should generate short branches automagically. > >> Well, yes. I'm suggesting that we should generate short branches automagically. > > We do generate short branches auto-magically, but only for back-branches, where we know where the target is at the time we emit the jump. So _forward jumps_ get the short (pun intended) end of the stick. > > I thought about this a bit a few years back: I can imagine how could one do multiple scratch emits that try to progressively figure out which forward jumps can be shortened. That would need to be iterative, because shortening an _inner_ jump likely opens opportunities for shortening more _outer_ jumps. Or maybe you can do this from the end, would that guarantee completeness? Anyway, this opens a question how this all impacts compilation time. I guess it is not prohibitive for small code blobs like stubs. But then, going through all this hassle to only optimize stubs? We might as well spend this time hand-optimizing the forward jumps by hand :) @shipilev One difficulty with shuffling code up the buffer would be recognizing where an instruction or embedded data has been nop-padded for alignment. There is no marker for that at present. The other obvious one is keeping all your relocs targeted at the correct instruction (not just adjusting offsets incrementally but also, potentially, removing a reloc_None added to bridge a large enough gap between sites). ------------- PR Comment: https://git.openjdk.org/jdk/pull/26731#issuecomment-3178764360 From aph at openjdk.org Tue Aug 12 10:39:13 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 12 Aug 2025 10:39:13 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX In-Reply-To: <4fpY7gp5xhFqF3w8dL48-FlfoEeY7wPg997BU7Ka0Gc=.84d1174e-ea12-434a-ad75-d7acf61c2a5b@github.com> References: <6KBUzFUMEtIKXUhDGaNYEGtXmnSe7Ohu6ZTTmuH07NI=.e3d665d4-73d8-41c5-95a2-5e1e284eeb3a@github.com> <4fpY7gp5xhFqF3w8dL48-FlfoEeY7wPg997BU7Ka0Gc=.84d1174e-ea12-434a-ad75-d7acf61c2a5b@github.com> Message-ID: On Tue, 12 Aug 2025 10:19:54 GMT, Aleksey Shipilev wrote: > > Well, yes. I'm suggesting that we should generate short branches automagically. > > We do generate short branches auto-magically, but only for back-branches, where we know where the target is at the time we emit the jump. So _forward jumps_ get the short (pun intended) end of the stick. > > I thought about this a bit a few years back: I can imagine how could one do multiple scratch emits that try to progressively figure out which forward jumps can be shortened. That would need to be iterative, because shortening an _inner_ jump likely opens opportunities for shortening more _outer_ jumps. Or maybe you can do this from the end, would that guarantee completeness? Why would we want to guarantee anything? I'm tempted to quote Emerson here about "A foolish consistency..." Do it once. > Anyway, this opens a question how this all impacts compilation time. I guess it is not prohibitive for small code blobs like stubs. But then, going through all this hassle to only optimize stubs? Yes. So that neither you nor I ever has to look at one of these PRs again. > We might as well spend this time hand-optimizing the forward jumps by hand :) But that's _boring_. Fixing the problem properly would be fun. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26731#issuecomment-3178765649 From aph at openjdk.org Tue Aug 12 10:45:10 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 12 Aug 2025 10:45:10 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX In-Reply-To: References: Message-ID: <2BQeVYAi3gFOud1KTtzqeWysjhJ22TOVRzGx0BrqMB4=.cce9ae41-4697-4c42-af6a-164949683642@github.com> On Mon, 11 Aug 2025 17:38:28 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to address the failure caused by x86 forward jump offset exceeding imm8 displacement when running the HotSpot jtreg test `test/hotspot/jtreg/compiler/c2/TestLWLockingCodeGen.java` using Intel APX (on SDE emulator). > > This bug triggers an assertion failure in methodHandles_x86.cpp because the assembler emits a short forward jump (imm8 displacement) whose target is more than 127 bytes away, exceeding the allowed range. This appears to be caused by larger stub code size when APX instruction encoding is enabled. > > The fix for this issue is to replace the `jccb` instruction with` jcc` in methodHandles_x86.cpp. Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26731#pullrequestreview-3109851721 From aph at openjdk.org Tue Aug 12 10:45:11 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 12 Aug 2025 10:45:11 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX In-Reply-To: <4fpY7gp5xhFqF3w8dL48-FlfoEeY7wPg997BU7Ka0Gc=.84d1174e-ea12-434a-ad75-d7acf61c2a5b@github.com> References: <6KBUzFUMEtIKXUhDGaNYEGtXmnSe7Ohu6ZTTmuH07NI=.e3d665d4-73d8-41c5-95a2-5e1e284eeb3a@github.com> <4fpY7gp5xhFqF3w8dL48-FlfoEeY7wPg997BU7Ka0Gc=.84d1174e-ea12-434a-ad75-d7acf61c2a5b@github.com> Message-ID: On Tue, 12 Aug 2025 10:19:54 GMT, Aleksey Shipilev wrote: >>> > > Looks good. This is diagnostics code, so performance is not a question. >>> > > I think we generally avoid shortening branches over `__ STOP`, for example, which size is generally unpredictable. So this looks in alignment with that tactics. Maybe you want to unshorten the branch at L157 as well. >>> > >>> > >>> > All thi.s long-and-short branch stuff is a pain. I wonder, given that we're now saving stubs in an archive, whether we should just bite the bullet and implement branch relaxation for stubs. I don't think it would be very hard. >>> >>> Code density still matters for runtime performance, alas. >> >> Well, yes. I'm suggesting that we should generate short branches automagically. > >> Well, yes. I'm suggesting that we should generate short branches automagically. > > We do generate short branches auto-magically, but only for back-branches, where we know where the target is at the time we emit the jump. So _forward jumps_ get the short (pun intended) end of the stick. > > I thought about this a bit a few years back: I can imagine how could one do multiple scratch emits that try to progressively figure out which forward jumps can be shortened. That would need to be iterative, because shortening an _inner_ jump likely opens opportunities for shortening more _outer_ jumps. Or maybe you can do this from the end, would that guarantee completeness? Anyway, this opens a question how this all impacts compilation time. I guess it is not prohibitive for small code blobs like stubs. But then, going through all this hassle to only optimize stubs? We might as well spend this time hand-optimizing the forward jumps by hand :) > @shipilev One difficulty with shuffling code up the buffer would be recognizing where an instruction or embedded data has been nop-padded for alignment. There is no marker for that at present. The other obvious one is keeping all your relocs targeted at the correct instruction (not just adjusting offsets incrementally but also, potentially, removing a reloc_None added to bridge a large enough gap between sites). I don't think that any of these are blockers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26731#issuecomment-3178783341 From shade at openjdk.org Tue Aug 12 10:53:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Aug 2025 10:53:10 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX In-Reply-To: References: <6KBUzFUMEtIKXUhDGaNYEGtXmnSe7Ohu6ZTTmuH07NI=.e3d665d4-73d8-41c5-95a2-5e1e284eeb3a@github.com> <4fpY7gp5xhFqF3w8dL48-FlfoEeY7wPg997BU7Ka0Gc=.84d1174e-ea12-434a-ad75-d7acf61c2a5b@github.com> Message-ID: On Tue, 12 Aug 2025 10:36:14 GMT, Andrew Haley wrote: > But that's _boring_. Fixing the problem properly would be fun. I agree! Happy to share this fun with someone else :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26731#issuecomment-3178805386 From dskantz at openjdk.org Tue Aug 12 11:57:38 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 12 Aug 2025 11:57:38 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v3] In-Reply-To: References: Message-ID: > This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). > > In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. > > The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. > > [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 > > [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 > > Testing: T1-4. > > Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision: - comment - changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26685/files - new: https://git.openjdk.org/jdk/pull/26685/files/69596e61..0535d1f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26685&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26685&range=01-02 Stats: 18 lines in 2 files changed: 8 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/26685.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26685/head:pull/26685 PR: https://git.openjdk.org/jdk/pull/26685 From rcastanedalo at openjdk.org Tue Aug 12 12:53:11 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Aug 2025 12:53:11 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v3] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 11:57:38 GMT, Daniel Skantz wrote: >> This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). >> >> In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. >> >> The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. >> >> [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 >> >> [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 >> >> Testing: T1-4. >> >> Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. > > Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision: > > - comment > - changes Thanks for addressing my comments, Daniel! Please re-test to ensure the new limit is OK on all Oracle's internal test configurations. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26685#pullrequestreview-3110462641 From epeter at openjdk.org Tue Aug 12 12:59:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 12:59:13 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> Message-ID: On Tue, 12 Aug 2025 09:34:37 GMT, Manuel H?ssig wrote: >> A loop of the form >> >> MemorySegment ms = {}; >> for (long i = 0; i < ms.byteSize() / 8L; i++) { >> // vectorizable work >> } >> >> does not vectorize, whereas >> >> MemorySegment ms = {}; >> long size = ms.byteSize(); >> for (long i = 0; i < size / 8L; i++) { >> // vectorizable work >> } >> >> vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: >> >> https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 >> >> Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. >> >> So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. >> >> @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem >> >> ## Change Description >> >> Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. >> >>
Explored Alternatives >> 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. >> 2. I... > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - Improve comment > - Fix build failure on product src/hotspot/share/opto/loopopts.cpp line 233: > 231: #ifndef PRODUCT > 232: if (TraceLoopOpts) { > 233: tty->print("Split N%d through Phi N%d in %s N%d", n->_idx, phi->_idx, region->Name(), region->_idx); Why not also display the `n->Name()`? test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentByteSizeLongLoopLimit.java line 62: > 60: @Test > 61: @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0", > 62: IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0", Oh I only just saw it now. Do you think we can specify the `VECTOR_SIZE` more precisely? If we don't assert the vector size, it could be that we generate vectors that are smaller, and produce lower performance. The issue may have been that we have mixed types here. I wonder if this approach would work for you here: `test/hotspot/jtreg/compiler/loopopts/superword/TestUnorderedReductionPartialVectorization.java: IRNode.VECTOR_CAST_I2L, IRNode.VECTOR_SIZE + "min(max_int, max_long)", "> 0",` test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentField.java line 57: > 55: @Test > 56: @IR(counts = {IRNode.LOAD_VECTOR_B, IRNode.VECTOR_SIZE_ANY, "> 0", > 57: IRNode.ADD_VB, IRNode.VECTOR_SIZE_ANY, "> 0", Do you actually need the `IRNode.VECTOR_SIZE_ANY` here? Is the default failing for you? Suggestion: @IR(counts = {IRNode.LOAD_VECTOR_B, "> 0", IRNode.ADD_VB, "> 0", ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2269738272 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2269751744 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2269755464 From epeter at openjdk.org Tue Aug 12 13:27:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 13:27:17 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> Message-ID: <7FL_BZG7uFGHnXHf7eZNW40BmMOQzwbMZU0fwWwXwmg=.dc62c972-0e67-47df-8a7a-66526dd18d84@github.com> On Tue, 12 Aug 2025 09:34:37 GMT, Manuel H?ssig wrote: >> A loop of the form >> >> MemorySegment ms = {}; >> for (long i = 0; i < ms.byteSize() / 8L; i++) { >> // vectorizable work >> } >> >> does not vectorize, whereas >> >> MemorySegment ms = {}; >> long size = ms.byteSize(); >> for (long i = 0; i < size / 8L; i++) { >> // vectorizable work >> } >> >> vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: >> >> https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 >> >> Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. >> >> So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. >> >> @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem >> >> ## Change Description >> >> Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. >> >>
Explored Alternatives >> 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. >> 2. I... > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - Improve comment > - Fix build failure on product src/hotspot/share/opto/loopnode.hpp line 1643: > 1641: int _loop_entry_wins; > 1642: // Number of wins on a loop back-edge, which pay dividends on every iteration. > 1643: int _loop_back_wins; Suggestion: // Sum of all wins regardless of where they happen. This applies to Loops phis as well as non-loop phis. int _total_wins; // For Loops, wins have different impact depending on if they happen on loop entry or on the backedge. // Number of wins on a loop entry edge if the split is through a loop head, // otherwise 0. Entry edge wins only pay dividends once on loop entry. int _loop_entry_wins; // Number of wins on a loop back-edge, which pay dividends on every iteration. int _loop_back_wins; src/hotspot/share/opto/loopnode.hpp line 1659: > 1657: } > 1658: _total_wins++; > 1659: } Why not make the `region` a field? That way, you don't have to pass it every time. And: if the region is a Loop, then we should have `_total_wins = _loop_entry_wins + _loop_back_wins`, correct? And if it is not a Loop, then the loop fields should be zero. You could add such an assert in `profitable` :) src/hotspot/share/opto/loopnode.hpp line 1673: > 1671: // dependant node, i.e. spliting a Bool node after splitting a Cmp node. > 1672: bool profitable(int policy) const { > 1673: return policy < 0 || (_loop_entry_wins == 0 && _total_wins > policy) || _loop_back_wins > policy; This already looks better. I'm wondering if we can still improve the readability a bit. Why not group the descriptions with the corresponding conditions? // In general this means that the split has to have more wins than specified // in the policy. In loops, we need to be careful when splitting, because it // can sufficiently rearrange the loop structure to prevent RCE and thus // vectorization. Thus, we only deem splitting profitable if the win of a // split is not on the entry edge, as such wins only pay off once and have // a high chance of messing up the loop structure. This seems to go with condition `(_loop_entry_wins == 0 && _total_wins > policy)` // in the policy. In loops, we need to be careful when splitting, because it // can sufficiently rearrange the loop structure to prevent RCE and thus // vectorization. Maybe we can restate the argument a little, and make it more explicit what kinds of rearrangements are problematic here? Maybe an example could help. What is it exactly that we risk loosing here? // vectorization. Thus, we only deem splitting profitable if the win of a // split is not on the entry edge, as such wins only pay off once and have // a high chance of messing up the loop structure. How is this impacted if we have wins on both the entry and the backedge? Is that possible? Do we have any benchmarks here? It would be nice if we could say that if we **only** had entry wins and no backedge wins, then it's ok to not split, because this win would only have happened once per loop execution. But as soon as we have wins on the backedge, it would be profitable to split, and we should do so, right? // a high chance of messing up the loop structure. However, if there are // wins on the entry edge and also sufficient wins on the backadge, which // pay off on every iteration, a split is also deemed profiable. Ah, you are arguing about that here actually. Must be about this condition: `_loop_back_wins > policy` You see, I'm struggling a bit to follow here and have to reconstruct it ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2269782994 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2269836104 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2269830904 From mhaessig at openjdk.org Tue Aug 12 14:07:15 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 14:07:15 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: <7FL_BZG7uFGHnXHf7eZNW40BmMOQzwbMZU0fwWwXwmg=.dc62c972-0e67-47df-8a7a-66526dd18d84@github.com> References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> <7FL_BZG7uFGHnXHf7eZNW40BmMOQzwbMZU0fwWwXwmg=.dc62c972-0e67-47df-8a7a-66526dd18d84@github.com> Message-ID: On Tue, 12 Aug 2025 13:24:13 GMT, Emanuel Peter wrote: > And: if the region is a Loop, then we should have _total_wins = _loop_entry_wins + _loop_back_wins, correct? If a `LoopNode` can only have `EntryControl` and `LoopBackControl` as control inputs, it should be correct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2269973130 From mhaessig at openjdk.org Tue Aug 12 14:11:17 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 14:11:17 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> Message-ID: On Tue, 12 Aug 2025 12:55:15 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: >> >> - Improve comment >> - Fix build failure on product > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentField.java line 57: > >> 55: @Test >> 56: @IR(counts = {IRNode.LOAD_VECTOR_B, IRNode.VECTOR_SIZE_ANY, "> 0", >> 57: IRNode.ADD_VB, IRNode.VECTOR_SIZE_ANY, "> 0", > > Do you actually need the `IRNode.VECTOR_SIZE_ANY` here? Is the default failing for you? > Suggestion: > > @IR(counts = {IRNode.LOAD_VECTOR_B, "> 0", > IRNode.ADD_VB, "> 0", Here I do not seem to need them. In `TestMemorySegmentByteSizeLongLoopLimit.java` they are needed, however. Will remove. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2269983775 From shade at openjdk.org Tue Aug 12 14:17:24 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Aug 2025 14:17:24 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 Message-ID: When recording adapter entries, we record _offsets_, not the actual addresses: entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage" and fails. This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. Additional testing: - [x] Linux ARM32 server fastdebug, `java -version` now works ------------- Commit messages: - Unsigned overflow is not UB - Fix Changes: https://git.openjdk.org/jdk/pull/26746/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26746&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365229 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26746/head:pull/26746 PR: https://git.openjdk.org/jdk/pull/26746 From shade at openjdk.org Tue Aug 12 14:17:24 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Aug 2025 14:17:24 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 14:06:35 GMT, Aleksey Shipilev wrote: > When recording adapter entries, we record _offsets_, not the actual addresses: > > > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > > Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage" and fails. > > This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. > > The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. > > Additional testing: > - [x] Linux ARM32 server fastdebug, `java -version` now works @adinn, @vnkozlov -- thoughts? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3179503582 From shade at openjdk.org Tue Aug 12 14:25:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Aug 2025 14:25:12 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 14:06:35 GMT, Aleksey Shipilev wrote: > When recording adapter entries, we record _offsets_, not the actual addresses: > > > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > > Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". > > This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. > > The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. > > Additional testing: > - [x] Linux ARM32 server fastdebug, `java -version` now works Alternative: we fully unroll the loop and wrap the check for `entry_offset[3]` with `#if !defined(ARM32) && !defined(ZERO)`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3179570843 From epeter at openjdk.org Tue Aug 12 14:34:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 14:34:15 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> <7FL_BZG7uFGHnXHf7eZNW40BmMOQzwbMZU0fwWwXwmg=.dc62c972-0e67-47df-8a7a-66526dd18d84@github.com> Message-ID: On Tue, 12 Aug 2025 14:04:56 GMT, Manuel H?ssig wrote: >> src/hotspot/share/opto/loopnode.hpp line 1659: >> >>> 1657: } >>> 1658: _total_wins++; >>> 1659: } >> >> Why not make the `region` a field? That way, you don't have to pass it every time. >> >> And: if the region is a Loop, then we should have `_total_wins = _loop_entry_wins + _loop_back_wins`, correct? And if it is not a Loop, then the loop fields should be zero. You could add such an assert in `profitable` :) > >> And: if the region is a Loop, then we should have _total_wins = _loop_entry_wins + _loop_back_wins, correct? > > If a `LoopNode` can only have `EntryControl` and `LoopBackControl` as control inputs, it should be correct. I think that is actually the very definition of a `LoopNode`: it has the entry edge on `in(1)` and the backedge on `in(2)` ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270071534 From asmehra at openjdk.org Tue Aug 12 14:42:12 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 12 Aug 2025 14:42:12 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 14:22:54 GMT, Aleksey Shipilev wrote: >> When recording adapter entries, we record _offsets_, not the actual addresses: >> >> >> entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; >> >> >> Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". >> >> This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. >> >> The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. >> >> Additional testing: >> - [x] Linux ARM32 server fastdebug, `java -version` now works > > Alternative: we fully unroll the loop and wrap the check for `entry_offset[3]` with `#if !defined(ARM32) && !defined(ZERO)`. @shipilev why not use -1 for the offset value to indicate a nullptr address? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3179642383 From qamai at openjdk.org Tue Aug 12 14:42:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 12 Aug 2025 14:42:14 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: <7FL_BZG7uFGHnXHf7eZNW40BmMOQzwbMZU0fwWwXwmg=.dc62c972-0e67-47df-8a7a-66526dd18d84@github.com> References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> <7FL_BZG7uFGHnXHf7eZNW40BmMOQzwbMZU0fwWwXwmg=.dc62c972-0e67-47df-8a7a-66526dd18d84@github.com> Message-ID: <4BdaCvFlhP246J4Of7xfXav4hA9WnN681ht_WAQymHw=.f8a32dcc-5596-4b37-8195-df30065448b7@github.com> On Tue, 12 Aug 2025 13:22:16 GMT, Emanuel Peter wrote: > How is this impacted if we have wins on both the entry and the backedge? Is that possible? Do we have any benchmarks here? In general, we ignore the wins on the entry. So it is profitable if the wins on the loop back is greater than the threshold. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270102359 From shade at openjdk.org Tue Aug 12 14:48:11 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Aug 2025 14:48:11 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 14:22:54 GMT, Aleksey Shipilev wrote: >> When recording adapter entries, we record _offsets_, not the actual addresses: >> >> >> entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; >> >> >> Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". >> >> This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. >> >> The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. >> >> Additional testing: >> - [x] Linux ARM32 server fastdebug, `java -version` now works > > Alternative: we fully unroll the loop and wrap the check for `entry_offset[3]` with `#if !defined(ARM32) && !defined(ZERO)`. > @shipilev why not use -1 for the offset value to indicate a nullptr address? Thought about it, but then what do you store into `_c2i_no_clinit_check_offset = offset[3]`? Current code, while dodgy, would likely wrap back to nullptr. I have checked what `_c2i_no_clinit_check_offset` affects, and I am not 100% sure there are no secondary effects messing with it. So if we change the `offset` encoding, we should be considering this in a wholesale manner, which is IMO outside the scope for this patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3179666550 From asmehra at openjdk.org Tue Aug 12 14:54:13 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 12 Aug 2025 14:54:13 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 14:06:35 GMT, Aleksey Shipilev wrote: > When recording adapter entries, we record _offsets_, not the actual addresses: > > > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > > Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". > > This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. > > The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. > > Additional testing: > - [x] Linux ARM32 server fastdebug, `java -version` now works If I understand correctly offsets in AdapterBlob are only used when loading the blob from the AOTCodeCache to restore the address in AdapterHandlerEntry. (see `AdapterHandlerLibrary::lookup_aot_cache`). So if -1 is encountered, we just set the address to nullptr. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3179691433 From fjiang at openjdk.org Tue Aug 12 14:54:12 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 12 Aug 2025 14:54:12 GMT Subject: RFR: 8365302: RISC-V: compiler/loopopts/superword/TestAlignVector.java fails when vlen=128 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 06:38:33 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass when vlen=256, but fail when vlen=128. > > The error occurs because the test13aIL and test13bIL cases require ensuring that vectors are larger than what unrolling produces; otherwise, the corresponding vector IR will not be generated. > > We can use `JTREG="JAVA_OPTIONS=-XX:+TraceSuperWordLoopUnrollAnalysis"` during testing. > The tips in the log: > > 76844 1333 b 4 compiler.loopopts.superword.TestAlignVector::test13aIL (42 bytes) > slp analysis fails: unroll limit greater than max vector > > slp analysis: set max unroll to 4 > > > Therefore, we need to limit MaxVectorSize to greater than or equal to 32 bytes. > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/TestAlignVector.java on qemu-system with RVV when vlen=128/256 Looks good! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/26738#pullrequestreview-3111161315 From mhaessig at openjdk.org Tue Aug 12 14:54:20 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 14:54:20 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v4] In-Reply-To: References: Message-ID: > A loop of the form > > MemorySegment ms = {}; > for (long i = 0; i < ms.byteSize() / 8L; i++) { > // vectorizable work > } > > does not vectorize, whereas > > MemorySegment ms = {}; > long size = ms.byteSize(); > for (long i = 0; i < size / 8L; i++) { > // vectorizable work > } > > vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: > > https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 > > Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. > > So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. > > @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem > > ## Change Description > > Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. > >
Explored Alternatives > 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. > 2. Insert a "`PHASEIDEALLOOP0`" with `LoopOptsNone` that only perfor... Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Update field documentation Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26429/files - new: https://git.openjdk.org/jdk/pull/26429/files/e978ab7c..061d2975 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=02-03 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26429.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26429/head:pull/26429 PR: https://git.openjdk.org/jdk/pull/26429 From mhaessig at openjdk.org Tue Aug 12 14:57:14 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 14:57:14 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> <7FL_BZG7uFGHnXHf7eZNW40BmMOQzwbMZU0fwWwXwmg=.dc62c972-0e67-47df-8a7a-66526dd18d84@github.com> Message-ID: On Tue, 12 Aug 2025 14:31:40 GMT, Emanuel Peter wrote: >>> And: if the region is a Loop, then we should have _total_wins = _loop_entry_wins + _loop_back_wins, correct? >> >> If a `LoopNode` can only have `EntryControl` and `LoopBackControl` as control inputs, it should be correct. > > I think that is actually the very definition of a `LoopNode`: it has the entry edge on `in(1)` and the backedge on `in(2)` ;) Alright, `region` becomes a field, and `profitable` gets asserts. Thank you for the suggestions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270159374 From shade at openjdk.org Tue Aug 12 14:59:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Aug 2025 14:59:14 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 14:51:47 GMT, Ashutosh Mehra wrote: > If I understand correctly offsets in AdapterBlob are only used when loading the blob from the AOTCodeCache to restore the address in AdapterHandlerEntry. (see `AdapterHandlerLibrary::lookup_aot_cache`). So if -1 is encountered, we just set the address to nullptr. Yes, that _almost_ works, but you need to put something into the *offset* field here: AdapterBlob::AdapterBlob(int size, CodeBuffer* cb, int entry_offset[AdapterBlob::ENTRY_COUNT]) : BufferBlob("I2C/C2I adapters", CodeBlobKind::Adapter, cb, size, sizeof(AdapterBlob)) { ... _c2i_offset = entry_offset[1]; _c2i_unverified_offset = entry_offset[2]; _c2i_no_clinit_check_offset = entry_offset[3]; // <------ CodeCache::commit(this); } If we propagate `-1` there, we then need to handle it around the accessors that compute the actual address as well. This can be done, but it would be more intrusive. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3179709727 From epeter at openjdk.org Tue Aug 12 15:01:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 15:01:15 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: <4BdaCvFlhP246J4Of7xfXav4hA9WnN681ht_WAQymHw=.f8a32dcc-5596-4b37-8195-df30065448b7@github.com> References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> <7FL_BZG7uFGHnXHf7eZNW40BmMOQzwbMZU0fwWwXwmg=.dc62c972-0e67-47df-8a7a-66526dd18d84@github.com> <4BdaCvFlhP246J4Of7xfXav4hA9WnN681ht_WAQymHw=.f8a32dcc-5596-4b37-8195-df30065448b7@github.com> Message-ID: On Tue, 12 Aug 2025 14:39:35 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/loopnode.hpp line 1673: >> >>> 1671: // dependant node, i.e. spliting a Bool node after splitting a Cmp node. >>> 1672: bool profitable(int policy) const { >>> 1673: return policy < 0 || (_loop_entry_wins == 0 && _total_wins > policy) || _loop_back_wins > policy; >> >> This already looks better. I'm wondering if we can still improve the readability a bit. Why not group the descriptions with the corresponding conditions? >> >> >> // In general this means that the split has to have more wins than specified >> // in the policy. In loops, we need to be careful when splitting, because it >> // can sufficiently rearrange the loop structure to prevent RCE and thus >> // vectorization. Thus, we only deem splitting profitable if the win of a >> // split is not on the entry edge, as such wins only pay off once and have >> // a high chance of messing up the loop structure. >> >> This seems to go with condition >> `(_loop_entry_wins == 0 && _total_wins > policy)` >> >> >> // in the policy. In loops, we need to be careful when splitting, because it >> // can sufficiently rearrange the loop structure to prevent RCE and thus >> // vectorization. >> >> Maybe we can restate the argument a little, and make it more explicit what kinds of rearrangements are problematic here? Maybe an example could help. What is it exactly that we risk loosing here? >> >> >> // vectorization. Thus, we only deem splitting profitable if the win of a >> // split is not on the entry edge, as such wins only pay off once and have >> // a high chance of messing up the loop structure. >> >> How is this impacted if we have wins on both the entry and the backedge? Is that possible? Do we have any benchmarks here? >> It would be nice if we could say that if we **only** had entry wins and no backedge wins, then it's ok to not split, because this win would only have happened once per loop execution. But as soon as we have wins on the backedge, it would be profitable to split, and we should do so, right? >> >> >> // a high chance of messing up the loop structure. However, if there are >> // wins on the entry edge and also sufficient wins on the backadge, which >> // pay off on every iteration, a split is also deemed profiable. >> >> Ah, you are arguing about that here actually. Must be about this condition: >> `_loop_back_wins > policy` >> >> You see, I'm struggling a bit to follow here and have to reconstruct it ;) > >> How is this impacted if we have wins on both the entry and the backedge? Is that possible? Do we have any benchmarks here? > > In general, we ignore the wins on the entry. So it is profitable if the wins on the loop back is greater than the threshold. Sure, I gathered as much. It would still be nice to have some examples / IR-tests / JMH-benchmarks here. Just to make sure we are getting the conditions right. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270173531 From asmehra at openjdk.org Tue Aug 12 15:09:15 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 12 Aug 2025 15:09:15 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: <56xCRrbU_XXZI3DQyRVVdJqr3zTnpzu13rO3jhC-Hd0=.cfaf07b2-1055-4d0d-83e1-44379b663235@github.com> On Tue, 12 Aug 2025 14:56:19 GMT, Aleksey Shipilev wrote: > If we propagate -1 there, we then need to handle it around the accessors that compute the actual address as well. This can be done, but it would be more intrusive. Yes, I agree. And there is only one place, if I am not wrong, that computes the actual address from the offset - `AdapterHandlerLibrary::lookup_aot_cache`. That's not very intrusive IMHO. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3179752562 From mablakatov at openjdk.org Tue Aug 12 15:11:32 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Tue, 12 Aug 2025 15:11:32 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v10] In-Reply-To: References: Message-ID: > Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. > > Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. > > The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms > IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms > LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms > > > Fujitsu A64FX (SVE 512-bit): > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms > IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms > LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge branch 'master' - Address review comments and simplify the implementation - remove the loops from gt128b methods making them 256b only - fixup: missed fnoregs in instruct reduce_mulL_256b - use an extra vtmp3 reg for the 256b integer method - remove a no longer needed change in reduce_mul_integral_le128b - cleanup: unify comments - Merge commit '8193856af8546332bfa180cb45154a4093b4fd2c' - remove the strictly-ordered FP implementation as unused - Compare VL against MaxVectorSize instead of FloatRegister::sve_vl_max - Use a dedicated ptrue predicate register This shifts MulReduction performance on Neoverse V1 a bit. Here Before if before this specific commit (ebad6dd37e332da44222c50cd17c69f3ff3f0635) and After is this commit. | Benchmark | Before (ops/ms) | After (ops/ms) | Diff (%) | | ------------------------ | --------------- | -------------- | -------- | | ByteMaxVector.MULLanes | 9883.151 | 9093.557 | -7.99% | | DoubleMaxVector.MULLanes | 2712.674 | 2607.367 | -3.89% | | FloatMaxVector.MULLanes | 3388.811 | 3291.429 | -2.88% | | IntMaxVector.MULLanes | 4765.554 | 5031.741 | +5.58% | | LongMaxVector.MULLanes | 2685.228 | 2896.445 | +7.88% | | ShortMaxVector.MULLanes | 5128.185 | 5197.656 | +1.35% | - cleanup: update a copyright notice Co-authored-by: Hao Sun - fixup: remove undefined insts from aarch64-asmtest.py - cleanup: address nits, rename several symbols - cleanup: remove unreferenced definitions - ... and 6 more: https://git.openjdk.org/jdk/compare/241808e1...91cbacc0 ------------- Changes: https://git.openjdk.org/jdk/pull/23181/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23181&range=09 Stats: 383 lines in 9 files changed: 236 ins; 2 del; 145 mod Patch: https://git.openjdk.org/jdk/pull/23181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23181/head:pull/23181 PR: https://git.openjdk.org/jdk/pull/23181 From shade at openjdk.org Tue Aug 12 15:20:15 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Aug 2025 15:20:15 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: <-9uQoRwxrI6VFCvWSMak5cG-Gin8OyVXn1Hw-X4h4ns=.20b6b199-035f-44e3-a863-a17cd4beaff3@github.com> On Tue, 12 Aug 2025 14:06:35 GMT, Aleksey Shipilev wrote: > When recording adapter entries, we record _offsets_, not the actual addresses: > > > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > > Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". > > This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. > > The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. > > Additional testing: > - [x] Linux ARM32 server fastdebug, `java -version` now works All right, I can take a look at that a bit later. But my gut feeling is that it would spread out... ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3179798671 From mhaessig at openjdk.org Tue Aug 12 15:20:20 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 15:20:20 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> Message-ID: On Tue, 12 Aug 2025 12:54:13 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: >> >> - Improve comment >> - Fix build failure on product > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentByteSizeLongLoopLimit.java line 62: > >> 60: @Test >> 61: @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0", >> 62: IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0", > > Oh I only just saw it now. Do you think we can specify the `VECTOR_SIZE` more precisely? > If we don't assert the vector size, it could be that we generate vectors that are smaller, and produce lower performance. > > The issue may have been that we have mixed types here. > > I wonder if this approach would work for you here: > `test/hotspot/jtreg/compiler/loopopts/superword/TestUnorderedReductionPartialVectorization.java: IRNode.VECTOR_CAST_I2L, IRNode.VECTOR_SIZE + "min(max_int, max_long)", "> 0",` TIL about the language for vector sizes, thank you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270242045 From dfenacci at openjdk.org Tue Aug 12 15:20:21 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 12 Aug 2025 15:20:21 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v6] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> <1QbX5WHkEdjP-unAFJ1vYaoIc9bV8zz8dA-vKZCkYn8=.8e3704ae-9490-4471-9e5c-dae44004d46f@github.com> Message-ID: On Mon, 11 Aug 2025 11:13:08 GMT, Saranya Natarajan wrote: >> test/hotspot/jtreg/compiler/arguments/TestBciProfileWidth.java line 28: >> >>> 26: * @summary Test the range defined in globals.hpp for BciProfileWidth >>> 27: * @bug 8358696 >>> 28: * @run main/othervm -XX:BciProfileWidth=0 >> >> `BciProfileWidth` is debug only, right? > > Thank you for the review. > I have now included `@requires vm.debug` Shouldn't we check that the vm doesn't crash with `BciProfileWidth=-1` and `BciProfileWidth=100000` (or another very high value)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26139#discussion_r2270238127 From duke at openjdk.org Tue Aug 12 15:25:15 2025 From: duke at openjdk.org (Samuel Chee) Date: Tue, 12 Aug 2025 15:25:15 GMT Subject: RFR: 8361890: Aarch64: Removal of redundant dmb from C1 AtomicLong methods In-Reply-To: <60YMRP6cNslwEeVX2TWmnMYdO872xGaeShKMEj0dWGY=.2f4f504f-93d1-4bab-b721-e5c964f4c465@github.com> References: <60YMRP6cNslwEeVX2TWmnMYdO872xGaeShKMEj0dWGY=.2f4f504f-93d1-4bab-b721-e5c964f4c465@github.com> Message-ID: On Thu, 10 Jul 2025 15:49:40 GMT, Samuel Chee wrote: > The current C1 implementation of AtomicLong methods > which either adds or exchanges (such as getAndAdd) > emit one of a ldaddal and swpal respectively when using > LSE as well as an immediately proceeding dmb. Since > ldaddal/swpal have both acquire and release semantics, > this provides similar ordering guarantees to a dmb.full > so the dmb here is redundant and can be removed. > > This is due to both clause 7 and clause 11 of the > definition of Barrier-ordered-before in B2.3.7 of the > DDI0487 L.a Arm Architecture Reference Manual for A-profile > architecture being satisfied by the existence of a > ldaddal/swpal which ensures such memory ordering guarantees. Can I just check, do I need to wait for another reviewer or can I start the integrate process? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26245#issuecomment-3179815978 From snatarajan at openjdk.org Tue Aug 12 15:27:59 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 12 Aug 2025 15:27:59 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled [v3] In-Reply-To: References: Message-ID: > **Issue** > An error, `assert(data->is_ReceiverTypeData()) failed: bad profile data type`, is encountered during C2 compilation due to bad profile data. This occurs when the code is compiled with `TypeProfileCasts` option disabled. > > **Analysis** > The assertion failure occurs in `record_profiled_receiver_for_speculation` that analyzes the profiling information in the method data to determine whether a null value has been observed in the `instanceof` operation. This information is encoded in the `BitData` during profiling. When the method identifies that a null has been seen, it proceeds to inspect the associated `ReceiverTypeData` to see if the type check is always performed against null. However, in this scenario, the incoming profiling data is of type `BitData` rather than `ReceiverTypeData`, leading to the assertion failure. > > The profiling information for null seen for operations `aastore`, `instanceof`, and `checkcast` is recorded by the method `profile_null_seen `(in` src/hotspot/cpu/x86/templateTable_x86.cpp `). On investigating this method, it can be observed that the method data pointer is not updated for `VirtualCallData` (which is a subclass of `ReceiverTypeData`) when the `TypeProfileCasts` option is disabled. > > **Solution** > My proposal is to inspect the `ReceiverTypeData` in function `record_profiled_receiver_for_speculation` only if `TypeProfileCasts` is enabled (this is based on the fact that the relevant method data pointer is not updated when `TypeProfileCasts` is disabled). > > **Question to reviewers** > Do you think this is a reasonable fix ? > > **Testing** > GitHub Actions > tier1 to tier3 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: addressing review: reducing number of iteration ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26640/files - new: https://git.openjdk.org/jdk/pull/26640/files/767f4e87..31c645de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26640&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26640&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26640.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26640/head:pull/26640 PR: https://git.openjdk.org/jdk/pull/26640 From snatarajan at openjdk.org Tue Aug 12 15:28:00 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 12 Aug 2025 15:28:00 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled [v2] In-Reply-To: <7BRWHyYaTAEbv7Yery2pRVrzCQfKB0sBFIh4M4xsCN8=.c0b2463a-4202-428d-bd3e-fa082cbbbf46@github.com> References: <7BRWHyYaTAEbv7Yery2pRVrzCQfKB0sBFIh4M4xsCN8=.c0b2463a-4202-428d-bd3e-fa082cbbbf46@github.com> Message-ID: On Tue, 12 Aug 2025 09:33:21 GMT, Manuel H?ssig wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> adding test > > test/hotspot/jtreg/compiler/arguments/TestProfileCasts.java line 43: > >> 41: >> 42: public static void main(String[] args) { >> 43: for (int i = 0; i < 100_000; i++) { > > Do you really need 100'000 iterations to get it to compile or can you reduce it a bit? I have reduced this to 100 after I checked that the (new) test fails without the current fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26640#discussion_r2270265360 From mhaessig at openjdk.org Tue Aug 12 15:34:14 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 15:34:14 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled [v2] In-Reply-To: References: <7BRWHyYaTAEbv7Yery2pRVrzCQfKB0sBFIh4M4xsCN8=.c0b2463a-4202-428d-bd3e-fa082cbbbf46@github.com> Message-ID: On Tue, 12 Aug 2025 15:24:29 GMT, Saranya Natarajan wrote: >> test/hotspot/jtreg/compiler/arguments/TestProfileCasts.java line 43: >> >>> 41: >>> 42: public static void main(String[] args) { >>> 43: for (int i = 0; i < 100_000; i++) { >> >> Do you really need 100'000 iterations to get it to compile or can you reduce it a bit? > > I have reduced this to 100 after I checked that the (new) test fails without the current fix. Excellent, thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26640#discussion_r2270288589 From mhaessig at openjdk.org Tue Aug 12 15:34:13 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 15:34:13 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled [v3] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 15:27:59 GMT, Saranya Natarajan wrote: >> **Issue** >> An error, `assert(data->is_ReceiverTypeData()) failed: bad profile data type`, is encountered during C2 compilation due to bad profile data. This occurs when the code is compiled with `TypeProfileCasts` option disabled. >> >> **Analysis** >> The assertion failure occurs in `record_profiled_receiver_for_speculation` that analyzes the profiling information in the method data to determine whether a null value has been observed in the `instanceof` operation. This information is encoded in the `BitData` during profiling. When the method identifies that a null has been seen, it proceeds to inspect the associated `ReceiverTypeData` to see if the type check is always performed against null. However, in this scenario, the incoming profiling data is of type `BitData` rather than `ReceiverTypeData`, leading to the assertion failure. >> >> The profiling information for null seen for operations `aastore`, `instanceof`, and `checkcast` is recorded by the method `profile_null_seen `(in` src/hotspot/cpu/x86/templateTable_x86.cpp `). On investigating this method, it can be observed that the method data pointer is not updated for `VirtualCallData` (which is a subclass of `ReceiverTypeData`) when the `TypeProfileCasts` option is disabled. >> >> **Solution** >> My proposal is to inspect the `ReceiverTypeData` in function `record_profiled_receiver_for_speculation` only if `TypeProfileCasts` is enabled (this is based on the fact that the relevant method data pointer is not updated when `TypeProfileCasts` is disabled). >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? >> >> **Testing** >> GitHub Actions >> tier1 to tier3 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review: reducing number of iteration Marked as reviewed by mhaessig (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26640#pullrequestreview-3111436190 From epeter at openjdk.org Tue Aug 12 15:38:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 15:38:23 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Mon, 11 Aug 2025 09:08:49 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java >> >> Co-authored-by: Manuel H?ssig > > src/hotspot/share/opto/vectorization.cpp line 598: > >> 596: // If iv_stride <= 0, i.e. last <= iv <= init: >> 597: // (iv - init) * scale_1 >= (iv - init) * iv_scale >> 598: // (iv - last) * scale_1 <= (iv - last) * iv_scale (NEG-STRIDE) > > Suggestion: > > // If iv_stride >= 0, i.e. init <= iv <= last: > // (iv - init) * iv_scale_1 <= (iv - init) * iv_scale2 > // (iv - last) * iv_scale_1 >= (iv - last) * iv_scale2 (POS-STRIDE) > // If iv_stride <= 0, i.e. last <= iv <= init: > // (iv - init) * iv_scale_1 >= (iv - init) * iv_scale2 > // (iv - last) * iv_scale_1 <= (iv - last) * iv_scale2 (NEG-STRIDE) > > If I am not massively confused, the `iv_scale`s should be like this. Correct! except that it should be `iv_scale_1` -> `iv_scale1` ;) Together we can do it eventually ? ? > src/hotspot/share/opto/vectorization.cpp line 604: > >> 602: // p1(init) + size1 <= p2(init) (if iv_stride >= 0) | p2(last) + size2 <= p1(last) (if iv_stride >= 0) | >> 603: // p1(last) + size1 <= p2(last) (if iv_stride <= 0) | p2(init) + size2 <= p1(init) (if iv_stride <= 0) | >> 604: // ----- is equivalent to ----- | ----- is equivalent to ----- | > > Suggestion: > > // ---- are equivalent to ----- | ---- are equivalent to ----- | > > This confused me a bit ? Sure, sounds good :) > src/hotspot/share/opto/vectorization.cpp line 625: > >> 623: // <= size1 + p1(init) - init * iv_scale2 + iv * iv_scale2 | <= size2 + p2(last) - init * iv_scale1 + iv * iv_scale1 | >> 624: // -- assumption -- | -- assumption -- | >> 625: // <= p2(init) - init * iv_scale2 + iv * iv_scale2 | <= p1(last) - init * iv_scale1 + iv * iv_scale1 | > > Suggestion: > > // = size1 + p1(init) - init * iv_scale1 + iv * iv_scale1 | = size2 + p2(last) - last * iv_scale2 + iv * iv_scale2 | > // ------ apply (POS-STRIDE) --------- | ------ apply (POS-STRIDE) --------- | > // <= size1 + p1(init) - init * iv_scale2 + iv * iv_scale2 | <= size2 + p2(last) - last * iv_scale1 + iv * iv_scale1 | > // -- assumption -- | -- assumption -- | > // <= p2(init) - init * iv_scale2 + iv * iv_scale2 | <= p1(last) - last * iv_scale1 + iv * iv_scale1 | > > > `LINEAR-FORM-LAST: p1(iv) = p1(last) - last * iv_scale1 + iv * iv_scale1` Nice catch! > src/hotspot/share/opto/vectorization.cpp line 639: > >> 637: // <= size1 + p1(last) - init * iv_scale2 + iv * iv_scale2 | <= size2 + p2(init) - init * iv_scale1 + iv * iv_scale1 | >> 638: // -- assumption -- | -- assumption -- | >> 639: // <= p2(last) - init * iv_scale2 + iv * iv_scale2 | <= p1(init) - init * iv_scale1 + iv * iv_scale1 | > > Suggestion: > > // = size1 + p1(last) - last * iv_scale1 + iv * iv_scale1 | = size2 + p2(init) - init * iv_scale2 + iv * iv_scale2 | > // ------ apply (NEG-STRIDE) --------- | ------ apply (NEG-STRIDE) --------- | > // <= size1 + p1(last) - last * iv_scale2 + iv * iv_scale2 | <= size2 + p2(init) - init * iv_scale1 + iv * iv_scale1 | > // -- assumption -- | -- assumption -- | > // <= p2(last) - last * iv_scale2 + iv * iv_scale2 | <= p1(init) - init * iv_scale1 + iv * iv_scale1 | Same case, thanks! > src/hotspot/share/opto/vectorization.cpp line 742: > >> 740: // a solution that also works when the loop is not entered: >> 741: // >> 742: // k = (init - stride - 1) / abs(stride) > > Suggestion: > > // k = (init - limit - 1) / abs(stride) > > Where does `stride` come from? If I did not miss anything, this should be `limit`. >From a blip in the brain I suppose :face_with_peeking_eye: Thanks for spotting it! > src/hotspot/share/opto/vectorization.cpp line 895: > >> 893: Node* diffL = (stride > 0) ? new SubLNode(limitL, initL) >> 894: : new SubLNode(initL, limitL); >> 895: Node* diffL_m1 = new AddLNode(diffL, igvn.longcon(-1)); > > Out of curiosity, why did you choose `AddL(diff, -1)` over `SubL(diff, 1)`? I think it would otherwise just get cannonicalized to `AddL` anyway. I'm saving IGVN from doing that additional work. For me this is a detail, some may call it premature optimization ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270284362 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270288117 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270294118 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270294865 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270300217 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270306868 From mhaessig at openjdk.org Tue Aug 12 15:43:15 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 15:43:15 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> <7FL_BZG7uFGHnXHf7eZNW40BmMOQzwbMZU0fwWwXwmg=.dc62c972-0e67-47df-8a7a-66526dd18d84@github.com> Message-ID: On Tue, 12 Aug 2025 14:54:39 GMT, Manuel H?ssig wrote: >> I think that is actually the very definition of a `LoopNode`: it has the entry edge on `in(1)` and the backedge on `in(2)` ;) > > Alright, `region` becomes a field, and `profitable` gets asserts. Thank you for the suggestions. > if the region is a Loop, then we should have _total_wins = _loop_entry_wins + _loop_back_wins, correct? Actually, this might not be true, because there is a codepath that resets the win count when a load is moved from an outer to an inner loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270325852 From epeter at openjdk.org Tue Aug 12 15:45:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 15:45:37 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v7] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. > - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. > - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` > - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` > - For aliasing analysis (adjacency and overlap), the "regu... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: apply suggestions from Manuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/238342ae..e05b6297 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=05-06 Stats: 17 lines in 1 file changed: 0 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Tue Aug 12 15:45:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 15:45:38 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Mon, 11 Aug 2025 12:11:55 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java >> >> Co-authored-by: Manuel H?ssig > > src/hotspot/share/opto/vectorization.cpp line 743: > >> 741: // >> 742: // k = (init - stride - 1) / abs(stride) >> 743: // last = MAX(init, init + k * stride) > > Suggestion: > > // last = MIN(init, init + k * stride) > > This should be `MIN` otherwise this does not clamp to zero. Yes! Actually the implementation does the correct thing here, just the comments are off. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270316002 From epeter at openjdk.org Tue Aug 12 15:45:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 15:45:38 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Mon, 11 Aug 2025 12:15:38 GMT, Manuel H?ssig wrote: >> src/hotspot/share/opto/vectorization.cpp line 752: >> >>> 750: // If stride < 0: >>> 751: // k = (init - stride - 1) / abs(stride) >>> 752: // last = MAX(init, init + k * stride) >> >> Suggestion: >> >> // LAST(init, stride, limit) >> // If stride > 0: >> // k = (limit - init - 1) / abs(stride) >> // last = MAX(init, init + k * stride) >> // If stride < 0: >> // k = (init - limit - 1) / abs(stride) >> // last = MIN(init, init + k * stride) > > Or to be a bit closer to the implementation: > Suggestion: > > // LAST(init, stride, limit) > // c = stride > 0 ? 1 : -1; > // k = (c * (limit - init) - 1) / abs(stride) > // If stride > 0: > // last = MAX(init, init + k * stride) > // If stride < 0: > // last = MIN(init, init + k * stride) I'll keep the first one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270323536 From dlong at openjdk.org Tue Aug 12 15:47:10 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 12 Aug 2025 15:47:10 GMT Subject: RFR: 8361890: Aarch64: Removal of redundant dmb from C1 AtomicLong methods In-Reply-To: <60YMRP6cNslwEeVX2TWmnMYdO872xGaeShKMEj0dWGY=.2f4f504f-93d1-4bab-b721-e5c964f4c465@github.com> References: <60YMRP6cNslwEeVX2TWmnMYdO872xGaeShKMEj0dWGY=.2f4f504f-93d1-4bab-b721-e5c964f4c465@github.com> Message-ID: <9-lURgIbZeiMmd8PZ8QfAdM2_lowP27ALoH0ST08bmc=.f35e4b4e-3463-487f-a246-b5b903afe83c@github.com> On Thu, 10 Jul 2025 15:49:40 GMT, Samuel Chee wrote: > The current C1 implementation of AtomicLong methods > which either adds or exchanges (such as getAndAdd) > emit one of a ldaddal and swpal respectively when using > LSE as well as an immediately proceeding dmb. Since > ldaddal/swpal have both acquire and release semantics, > this provides similar ordering guarantees to a dmb.full > so the dmb here is redundant and can be removed. > > This is due to both clause 7 and clause 11 of the > definition of Barrier-ordered-before in B2.3.7 of the > DDI0487 L.a Arm Architecture Reference Manual for A-profile > architecture being satisfied by the existence of a > ldaddal/swpal which ensures such memory ordering guarantees. Marked as reviewed by dlong (Reviewer). Yes, you need two reviews. I just approved it, so you should be good to go, assuming you did jcstress testing as requested by Andrew. ------------- PR Review: https://git.openjdk.org/jdk/pull/26245#pullrequestreview-3111521691 PR Comment: https://git.openjdk.org/jdk/pull/26245#issuecomment-3179929329 From epeter at openjdk.org Tue Aug 12 15:52:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 15:52:19 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Mon, 11 Aug 2025 12:28:31 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java >> >> Co-authored-by: Manuel H?ssig > > src/hotspot/share/opto/vectorization.cpp line 1026: > >> 1024: if (vp1.iv_scale() > vp2.iv_scale()) { >> 1025: swap(p1_init, p2_init); >> 1026: swap(size1, size2); > > Shouldn't we perform this swap before calling `make_last()`, since `make_last()` assumes `iv_scale1 < iv_scale2`? I don't think that `make_last` makes any assumptions about `iv_scale1 < iv_scale2`. But I could consider moving it earlier anyway. Do you think that is worth it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270365536 From epeter at openjdk.org Tue Aug 12 16:02:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 16:02:15 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> <7FL_BZG7uFGHnXHf7eZNW40BmMOQzwbMZU0fwWwXwmg=.dc62c972-0e67-47df-8a7a-66526dd18d84@github.com> Message-ID: <9edQTufQlYxamUcG1CnLFY8XpFy6x1xERJaeAXXSu_0=.424bc197-4b5b-44fa-8c6c-fd32ba415247@github.com> On Tue, 12 Aug 2025 15:40:39 GMT, Manuel H?ssig wrote: >> Alright, `region` becomes a field, and `profitable` gets asserts. Thank you for the suggestions. > >> if the region is a Loop, then we should have _total_wins = _loop_entry_wins + _loop_back_wins, correct? > > Actually, this might not be true, because there is a codepath that resets the win count when a load is moved from an outer to an inner loop. Hmm ok. That complicates things for sure. Do you feel you like you understand the implication on all cases here then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270418070 From phh at openjdk.org Tue Aug 12 16:12:09 2025 From: phh at openjdk.org (Paul Hohensee) Date: Tue, 12 Aug 2025 16:12:09 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 14:06:35 GMT, Aleksey Shipilev wrote: > When recording adapter entries, we record _offsets_, not the actual addresses: > > > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > > Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". > > This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. > > The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. > > Additional testing: > - [x] Linux ARM32 server fastdebug, `java -version` now works You might turn line 454 into a function right after the pointer_delta section of globalDefinitions.hpp. Vis inline uintptr_t raw_pointer(const volatile void* p, uintptr_t offset) { return p2u(p) + offset; } so line 454 becomes raw_pointer(cb->insts_begin(), entry_offset[i]) == 0 ------------- PR Review: https://git.openjdk.org/jdk/pull/26746#pullrequestreview-3111706868 From kvn at openjdk.org Tue Aug 12 16:12:10 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 12 Aug 2025 16:12:10 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 14:06:35 GMT, Aleksey Shipilev wrote: > The least horrible solution would be storing the actual address-es instead of int offsets. But that likely has footprint implications. We need offsets for leyden. I agree with current fix. It is general solution which will cover any future platforms which may not support all kind of adapters. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3180061256 PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3180068861 From kvn at openjdk.org Tue Aug 12 16:16:12 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 12 Aug 2025 16:16:12 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 14:06:35 GMT, Aleksey Shipilev wrote: > When recording adapter entries, we record _offsets_, not the actual addresses: > > > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > > Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". > > This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. > > The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. > > Additional testing: > - [x] Linux ARM32 server fastdebug, `java -version` now works Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26746#pullrequestreview-3111752177 From duke at openjdk.org Tue Aug 12 16:20:18 2025 From: duke at openjdk.org (duke) Date: Tue, 12 Aug 2025 16:20:18 GMT Subject: RFR: 8361890: Aarch64: Removal of redundant dmb from C1 AtomicLong methods In-Reply-To: <60YMRP6cNslwEeVX2TWmnMYdO872xGaeShKMEj0dWGY=.2f4f504f-93d1-4bab-b721-e5c964f4c465@github.com> References: <60YMRP6cNslwEeVX2TWmnMYdO872xGaeShKMEj0dWGY=.2f4f504f-93d1-4bab-b721-e5c964f4c465@github.com> Message-ID: On Thu, 10 Jul 2025 15:49:40 GMT, Samuel Chee wrote: > The current C1 implementation of AtomicLong methods > which either adds or exchanges (such as getAndAdd) > emit one of a ldaddal and swpal respectively when using > LSE as well as an immediately proceeding dmb. Since > ldaddal/swpal have both acquire and release semantics, > this provides similar ordering guarantees to a dmb.full > so the dmb here is redundant and can be removed. > > This is due to both clause 7 and clause 11 of the > definition of Barrier-ordered-before in B2.3.7 of the > DDI0487 L.a Arm Architecture Reference Manual for A-profile > architecture being satisfied by the existence of a > ldaddal/swpal which ensures such memory ordering guarantees. @spchee Your change (at version 4d169173c7bc619f331accfe6c34fc5496ee4bff) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26245#issuecomment-3180099998 From kvn at openjdk.org Tue Aug 12 16:21:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 12 Aug 2025 16:21:13 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 14:06:35 GMT, Aleksey Shipilev wrote: > When recording adapter entries, we record _offsets_, not the actual addresses: > > > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > > Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". > > This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. > > The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. > > Additional testing: > - [x] Linux ARM32 server fastdebug, `java -version` now works An other, more complex, solution would be to check `handler->get_c2i_*_entry()` for `nullptr` in `generate_adapter_code()` where we set offsets and set offset to 0. Then we can relax assert to `entry_offset[i] >= 0`. We can also remove `entry_offset[0] == 0` check before loop too. and start loop with `i = 0`. But it is more complicated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3180106488 From mhaessig at openjdk.org Tue Aug 12 16:22:23 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 16:22:23 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <0rNuFLFwXcWfF0-nQQEd9fbIrziHos8PZJ93sDPFObo=.0587492e-267b-4681-8fb8-605cdc20f1c3@github.com> On Tue, 12 Aug 2025 15:49:15 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectorization.cpp line 1026: >> >>> 1024: if (vp1.iv_scale() > vp2.iv_scale()) { >>> 1025: swap(p1_init, p2_init); >>> 1026: swap(size1, size2); >> >> Shouldn't we perform this swap before calling `make_last()`, since `make_last()` assumes `iv_scale1 < iv_scale2`? > > I don't think that `make_last` makes any assumptions about `iv_scale1 < iv_scale2`. > But I could consider moving it earlier anyway. Do you think that is worth it? I would do it because the proof states that if `iv_scale2 < iv_scale1` we swap them. It would keep it consistent. Also, you won't have to swap the spans. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270482568 From kvn at openjdk.org Tue Aug 12 16:31:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 12 Aug 2025 16:31:13 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled [v2] In-Reply-To: References: <7BRWHyYaTAEbv7Yery2pRVrzCQfKB0sBFIh4M4xsCN8=.c0b2463a-4202-428d-bd3e-fa082cbbbf46@github.com> Message-ID: On Tue, 12 Aug 2025 15:31:11 GMT, Manuel H?ssig wrote: >> I have reduced this to 100 after I checked that the (new) test fails without the current fix. > > Excellent, thank you! You can also use `-XX:CompileThresholdScaling=f` (specify `f` as 0.1, for example) flag to trigger compilation early to make sure 100 is enough. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26640#discussion_r2270503973 From qamai at openjdk.org Tue Aug 12 16:35:19 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 12 Aug 2025 16:35:19 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: <9edQTufQlYxamUcG1CnLFY8XpFy6x1xERJaeAXXSu_0=.424bc197-4b5b-44fa-8c6c-fd32ba415247@github.com> References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> <7FL_BZG7uFGHnXHf7eZNW40BmMOQzwbMZU0fwWwXwmg=.dc62c972-0e67-47df-8a7a-66526dd18d84@github.com> <9edQTufQlYxamUcG1CnLFY8XpFy6x1xERJaeAXXSu_0=.424bc197-4b5b-44fa-8c6c-fd32ba415247@github.com> Message-ID: On Tue, 12 Aug 2025 15:59:55 GMT, Emanuel Peter wrote: >>> if the region is a Loop, then we should have _total_wins = _loop_entry_wins + _loop_back_wins, correct? >> >> Actually, this might not be true, because there is a codepath that resets the win count when a load is moved from an outer to an inner loop. > > Hmm ok. That complicates things for sure. > Do you feel you like you understand the implication on all cases here then? But should the invariant still hold after the reset? `0 + 0 == 0` after all? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270513456 From epeter at openjdk.org Tue Aug 12 16:38:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 16:38:44 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <-whr-SaUIy_tGX_SZ2neVnaqRkDfcrYhUCtnzNBTSaY=.4477643f-7e94-4921-a424-0d274d01b73d@github.com> On Mon, 11 Aug 2025 13:39:41 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java >> >> Co-authored-by: Manuel H?ssig > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java line 62: > >> 60: for (String sac : List.of("-XX:-UseAutoVectorizationSpeculativeAliasingChecks", "-XX:+UseAutoVectorizationSpeculativeAliasingChecks")) { >> 61: TestFramework.runWithFlags("--add-modules", "java.base", "--add-exports", "java.base/jdk.internal.misc=ALL-UNNAMED", >> 62: "-XX:+UnlockExperimentalVMOptions", av, coh, sac); > > This might be a good fit for `Scenarios`. I find it easier to determine which cases failed. Wow, I have not realized that this would be so much better. Very nice ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270516734 From epeter at openjdk.org Tue Aug 12 16:38:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 16:38:43 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v8] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. > - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. > - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` > - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` > - For aliasing analysis (adjacency and overlap), the "regu... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: use Scenarios ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/e05b6297..4a240226 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=06-07 Stats: 9 lines in 1 file changed: 7 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From mhaessig at openjdk.org Tue Aug 12 16:45:14 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 16:45:14 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> <7FL_BZG7uFGHnXHf7eZNW40BmMOQzwbMZU0fwWwXwmg=.dc62c972-0e67-47df-8a7a-66526dd18d84@github.com> <4BdaCvFlhP246J4Of7xfXav4hA9WnN681ht_WAQymHw=.f8a32dcc-5596-4b37-8195-df30065448b7@github.com> Message-ID: On Tue, 12 Aug 2025 14:58:12 GMT, Emanuel Peter wrote: >>> How is this impacted if we have wins on both the entry and the backedge? Is that possible? Do we have any benchmarks here? >> >> In general, we ignore the wins on the entry. So it is profitable if the wins on the loop back is greater than the threshold. > > Sure, I gathered as much. It would still be nice to have some examples / IR-tests / JMH-benchmarks here. Just to make sure we are getting the conditions right. Is this better? // Is this split profitable with respect to the policy? // In general this means that the split has to have more wins than specified // in the policy. However, for loops we need to take into account where the // wins happen. bool profitable(int policy) const { assert(_region->is_Loop() || (_loop_entry_wins == 0 && _loop_back_wins == 0), "wins on loop edges without a loop"); // In loops, we need to be careful when splitting, because splitting nodes // related to the iv through the phi can sufficiently rearrange the loop // structure to prevent RCE and thus vectorization. Thus, we only deem splitting // profitable if the win of a split is not on the entry edge, as such wins // only pay off once and have a high chance of messing up the loop structure. return (_loop_entry_wins == 0 && _total_wins > policy) || // If there are wins on the entry edge but the backadge also has sufficient wins, // there is sufficient profitability to spilt regardless of the risk of messing // up the loop structure. _loop_back_wins > policy || // If the policy is less than 0, a split is always profitable, i.e. we always // split. This is needed when we split a node and then must also split a // dependant node, i.e. spliting a Bool node after splitting a Cmp node. policy < 0; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270534465 From epeter at openjdk.org Tue Aug 12 16:55:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 16:55:20 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v8] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Mon, 11 Aug 2025 08:16:03 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use Scenarios > > src/hotspot/share/opto/mempointer.cpp line 85: > >> 83: } >> 84: // Bail out if scale is NaN. >> 85: if (scale.is_NaN()) { > > If I understand correctly, then a summand cannot be NaN anymore? Do you still bail out somewhere in raw summands if you encounter NaN? Let me try to find an answer here... At least there is an assert in the constructor of `MemPointer`: for (int i = 0; i < summands.length(); i++) { const MemPointerSummand& s = summands.at(i); assert(s.variable() != nullptr, "variable cannot be null"); assert(!s.scale().is_NaN(), "non-NaN scale"); } for (int i = 0; i < raw_summands.length(); i++) { const MemPointerRawSummand& s = raw_summands.at(i); assert(!s.scaleI().is_NaN(), "non-NaN scale"); assert(!s.scaleL().is_NaN(), "non-NaN scale"); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270554176 From epeter at openjdk.org Tue Aug 12 17:07:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 17:07:24 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Mon, 11 Aug 2025 13:40:41 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java >> >> Co-authored-by: Manuel H?ssig > > Thank you for addressing my comments so far. Here goes another round :) @mhaessig Thanks for the detailed review! I think I responded to all your suggestions/comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3180237408 From epeter at openjdk.org Tue Aug 12 17:07:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 17:07:26 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v8] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <_xzX54JluvxKjADy6VAq8oY3lkRNsV3bYY35A4cJQpo=.3b345a86-2dea-48c2-99bd-7b63fc79af8e@github.com> On Tue, 12 Aug 2025 16:52:12 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/mempointer.cpp line 85: >> >>> 83: } >>> 84: // Bail out if scale is NaN. >>> 85: if (scale.is_NaN()) { >> >> If I understand correctly, then a summand cannot be NaN anymore? Do you still bail out somewhere in raw summands if you encounter NaN? > > Let me try to find an answer here... > At least there is an assert in the constructor of `MemPointer`: > > for (int i = 0; i < summands.length(); i++) { > const MemPointerSummand& s = summands.at(i); > assert(s.variable() != nullptr, "variable cannot be null"); > assert(!s.scale().is_NaN(), "non-NaN scale"); > } > for (int i = 0; i < raw_summands.length(); i++) { > const MemPointerRawSummand& s = raw_summands.at(i); > assert(!s.scaleI().is_NaN(), "non-NaN scale"); > assert(!s.scaleL().is_NaN(), "non-NaN scale"); > } And I think there is already some filtering in `canonicalize_raw_summands`: // Keep summands with non-zero scale. if (!scaleI.is_zero() && !scaleL.is_NaN()) { _raw_summands.at_put(pos_put++, MemPointerRawSummand(variable, scaleI, scaleL, int_group)); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270571417 From epeter at openjdk.org Tue Aug 12 17:07:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 17:07:26 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v8] In-Reply-To: <_xzX54JluvxKjADy6VAq8oY3lkRNsV3bYY35A4cJQpo=.3b345a86-2dea-48c2-99bd-7b63fc79af8e@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <_xzX54JluvxKjADy6VAq8oY3lkRNsV3bYY35A4cJQpo=.3b345a86-2dea-48c2-99bd-7b63fc79af8e@github.com> Message-ID: On Tue, 12 Aug 2025 17:00:54 GMT, Emanuel Peter wrote: >> Let me try to find an answer here... >> At least there is an assert in the constructor of `MemPointer`: >> >> for (int i = 0; i < summands.length(); i++) { >> const MemPointerSummand& s = summands.at(i); >> assert(s.variable() != nullptr, "variable cannot be null"); >> assert(!s.scale().is_NaN(), "non-NaN scale"); >> } >> for (int i = 0; i < raw_summands.length(); i++) { >> const MemPointerRawSummand& s = raw_summands.at(i); >> assert(!s.scaleI().is_NaN(), "non-NaN scale"); >> assert(!s.scaleL().is_NaN(), "non-NaN scale"); >> } > > And I think there is already some filtering in `canonicalize_raw_summands`: > > // Keep summands with non-zero scale. > if (!scaleI.is_zero() && !scaleL.is_NaN()) { > _raw_summands.at_put(pos_put++, MemPointerRawSummand(variable, scaleI, scaleL, int_group)); > } Ah, but the real work gets done here, in `MemPointer::make`: if (raw_summands.length() <= RAW_SUMMANDS_SIZE && summands.length() <= SUMMANDS_SIZE && has_no_NaN_in_con_and_summands(con, summands)) { return MemPointer(pointer, raw_summands, summands, con, size NOT_PRODUCT(COMMA trace)); } else { return MemPointer::make_trivial(pointer, size NOT_PRODUCT(COMMA trace)); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2270573790 From mhaessig at openjdk.org Tue Aug 12 17:08:50 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 17:08:50 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v5] In-Reply-To: References: Message-ID: > A loop of the form > > MemorySegment ms = {}; > for (long i = 0; i < ms.byteSize() / 8L; i++) { > // vectorizable work > } > > does not vectorize, whereas > > MemorySegment ms = {}; > long size = ms.byteSize(); > for (long i = 0; i < size / 8L; i++) { > // vectorizable work > } > > vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: > > https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 > > Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. > > So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. > > @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem > > ## Change Description > > Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. > >
Explored Alternatives > 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. > 2. Insert a "`PHASEIDEALLOOP0`" with `LoopOptsNone` that only perfor... Manuel H?ssig has updated the pull request incrementally with eight additional commits since the last revision: - Better documentation of profitable() - Remove vector sizes - Specify vector sizes - Merge branch 'jdk-8356176-byte-size' of github.com:mhaessig/jdk into jdk-8356176-byte-size - Add asserts - Make region a field - Even more better debug print - Remove redundant scenarios ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26429/files - new: https://git.openjdk.org/jdk/pull/26429/files/061d2975..461fea40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=03-04 Stats: 49 lines in 4 files changed: 20 ins; 10 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/26429.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26429/head:pull/26429 PR: https://git.openjdk.org/jdk/pull/26429 From mhaessig at openjdk.org Tue Aug 12 17:08:51 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 12 Aug 2025 17:08:51 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3] In-Reply-To: References: <7TeLVxAw4dlgyXZvCDyG3m8nB-OTxxKSY2hzDxoVCwc=.4b6e233a-335b-4da4-8edd-f4c02b66694d@github.com> <7FL_BZG7uFGHnXHf7eZNW40BmMOQzwbMZU0fwWwXwmg=.dc62c972-0e67-47df-8a7a-66526dd18d84@github.com> <9edQTufQlYxamUcG1CnLFY8XpFy6x1xERJaeAXXSu_0=.424bc197-4b5b-44fa-8c6c-fd32ba415247@github.com> Message-ID: On Tue, 12 Aug 2025 16:33:02 GMT, Quan Anh Mai wrote: >> Hmm ok. That complicates things for sure. >> Do you feel you like you understand the implication on all cases here then? > > But should the invariant still hold after the reset? `0 + 0 == 0` after all? Yes, you are right. The reset preserves the invariant. I was being paranoid ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270572702 From yzheng at openjdk.org Tue Aug 12 17:09:07 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 12 Aug 2025 17:09:07 GMT Subject: RFR: 8365218: [JVMCI] AArch64 CPU features are not computed correctly after 8364128 [v2] In-Reply-To: References: Message-ID: > https://github.com/openjdk/jdk/pull/26515 changes the `VM_Version::CPU_` constant values on AArch64 and Graal now sees unsupported CPU features. This may result in SIGILL due to Graal emitting unsupported instructions, such as `CPU_SHA3`-based eor3 instructions in AArch64 SHA3 stubs. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26727/files - new: https://git.openjdk.org/jdk/pull/26727/files/87f4700a..3e378957 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26727&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26727&range=00-01 Stats: 132 lines in 4 files changed: 15 ins; 102 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/26727.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26727/head:pull/26727 PR: https://git.openjdk.org/jdk/pull/26727 From yzheng at openjdk.org Tue Aug 12 17:12:26 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 12 Aug 2025 17:12:26 GMT Subject: RFR: 8365218: [JVMCI] AArch64 CPU features are not computed correctly after 8364128 [v3] In-Reply-To: References: Message-ID: <2fFSOt9xWkOX8rkgkSjKquHPzBCwnmzzstcUNJ16klU=.b2aeeb5f-7d74-4ad4-b0c3-53ae76824a6e@github.com> > https://github.com/openjdk/jdk/pull/26515 changes the `VM_Version::CPU_` constant values on AArch64 and Graal now sees unsupported CPU features. This may result in SIGILL due to Graal emitting unsupported instructions, such as `CPU_SHA3`-based eor3 instructions in AArch64 SHA3 stubs. Yudi Zheng has updated the pull request incrementally with two additional commits since the last revision: - style - style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26727/files - new: https://git.openjdk.org/jdk/pull/26727/files/3e378957..6aa51798 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26727&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26727&range=01-02 Stats: 21 lines in 4 files changed: 6 ins; 6 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/26727.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26727/head:pull/26727 PR: https://git.openjdk.org/jdk/pull/26727 From epeter at openjdk.org Tue Aug 12 17:20:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Aug 2025 17:20:17 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v5] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 17:08:50 GMT, Manuel H?ssig wrote: >> A loop of the form >> >> MemorySegment ms = {}; >> for (long i = 0; i < ms.byteSize() / 8L; i++) { >> // vectorizable work >> } >> >> does not vectorize, whereas >> >> MemorySegment ms = {}; >> long size = ms.byteSize(); >> for (long i = 0; i < size / 8L; i++) { >> // vectorizable work >> } >> >> vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: >> >> https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 >> >> Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. >> >> So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. >> >> @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem >> >> ## Change Description >> >> Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. >> >>
Explored Alternatives >> 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. >> 2. I... > > Manuel H?ssig has updated the pull request incrementally with eight additional commits since the last revision: > > - Better documentation of profitable() > - Remove vector sizes > - Specify vector sizes > - Merge branch 'jdk-8356176-byte-size' of github.com:mhaessig/jdk into jdk-8356176-byte-size > - Add asserts > - Make region a field > - Even more better debug print > - Remove redundant scenarios Ok, things are improving nicely ? src/hotspot/share/opto/loopnode.hpp line 1678: > 1676: // profitable if the win of a split is not on the entry edge, as such wins > 1677: // only pay off once and have a high chance of messing up the loop structure. > 1678: return (_loop_entry_wins == 0 && _total_wins > policy) || It may be good if you also mention that this applies not just to Loops with no entry wins, but also to non-loop Regions. Actually, I would suggest that you move the comment from above down to this section. // In general this means that the split has to have more wins than specified // in the policy. However, for loops we need to take into account where the // wins happen. src/hotspot/share/opto/loopnode.hpp line 1686: > 1684: // split. This is needed when we split a node and then must also split a > 1685: // dependant node, i.e. spliting a Bool node after splitting a Cmp node. > 1686: policy < 0; It seems to me that `policy < 0` actually implies `_loop_back_wins > policy`, because `policy < 0 <= _loop_back_wins`. Maybe it is still better to keep it, so we are explicit. ------------- PR Review: https://git.openjdk.org/jdk/pull/26429#pullrequestreview-3111933820 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270602622 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270599371 From dnsimon at openjdk.org Tue Aug 12 18:05:12 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 12 Aug 2025 18:05:12 GMT Subject: RFR: 8365218: [JVMCI] AArch64 CPU features are not computed correctly after 8364128 [v3] In-Reply-To: <2fFSOt9xWkOX8rkgkSjKquHPzBCwnmzzstcUNJ16klU=.b2aeeb5f-7d74-4ad4-b0c3-53ae76824a6e@github.com> References: <2fFSOt9xWkOX8rkgkSjKquHPzBCwnmzzstcUNJ16klU=.b2aeeb5f-7d74-4ad4-b0c3-53ae76824a6e@github.com> Message-ID: On Tue, 12 Aug 2025 17:12:26 GMT, Yudi Zheng wrote: >> https://github.com/openjdk/jdk/pull/26515 changes the `VM_Version::CPU_` constant values on AArch64 and Graal now sees unsupported CPU features. This may result in SIGILL due to Graal emitting unsupported instructions, such as `CPU_SHA3`-based eor3 instructions in AArch64 SHA3 stubs. > > Yudi Zheng has updated the pull request incrementally with two additional commits since the last revision: > > - style > - style LGTM. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26727#pullrequestreview-3112105684 From iveresov at openjdk.org Tue Aug 12 18:08:22 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 12 Aug 2025 18:08:22 GMT Subject: RFR: 8362530: VM crash with -XX:+PrintTieredEvents when collecting AOT profiling Message-ID: When printing tiered events we take the ttyLock and also now the trainingDataLock. While benign it's best to decouple these. The solution is to gather the output bits in a buffer and then print it. ------------- Commit messages: - Decouple gathering the output bits from printing Changes: https://git.openjdk.org/jdk/pull/26750/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26750&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8362530 Stats: 68 lines in 2 files changed: 11 ins; 2 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/26750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26750/head:pull/26750 PR: https://git.openjdk.org/jdk/pull/26750 From kvn at openjdk.org Tue Aug 12 18:14:15 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 12 Aug 2025 18:14:15 GMT Subject: RFR: 8362530: VM crash with -XX:+PrintTieredEvents when collecting AOT profiling In-Reply-To: References: Message-ID: <3PBJprPpO_wuEe1VZ04QMJVvV8C7PdBIDjI4cuSo4YI=.d9882c0c-5a21-4058-90e3-50797bb4f9e2@github.com> On Tue, 12 Aug 2025 18:02:16 GMT, Igor Veresov wrote: > When printing tiered events we take the ttyLock and also now the trainingDataLock. While benign it's best to decouple these. The solution is to gather the output bits in a buffer and then print it. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26750#pullrequestreview-3112138230 From bulasevich at openjdk.org Tue Aug 12 19:07:17 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 12 Aug 2025 19:07:17 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v7] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: <3gxI6LHS5BbGD3Dra6pdCPsRosCAO6W_rhUAQK0qAlA=.b75b1b5c-7e8d-4fbf-9b0d-83fdc424881e@github.com> On Mon, 11 Aug 2025 11:17:31 GMT, Saranya Natarajan wrote: >> **Issue** >> Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. >> >> **Analysis** >> On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. >> >> **Proposal** >> Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. >> >> **Issue in AArch64** >> Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? >> >> **Request to port maintainers** >> @dafedafe suggested that we keep the upper boun... > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review : adding vm.debug and moving a defn Marked as reviewed by bulasevich (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26139#pullrequestreview-3112354691 From phh at openjdk.org Tue Aug 12 20:41:11 2025 From: phh at openjdk.org (Paul Hohensee) Date: Tue, 12 Aug 2025 20:41:11 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 14:06:35 GMT, Aleksey Shipilev wrote: > When recording adapter entries, we record _offsets_, not the actual addresses: > > > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > > Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". > > This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. > > The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. > > Additional testing: > - [x] Linux ARM32 server fastdebug, `java -version` now works Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26746#pullrequestreview-3112745558 From sparasa at openjdk.org Wed Aug 13 00:54:53 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 13 Aug 2025 00:54:53 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX [v2] In-Reply-To: References: Message-ID: > The goal of this PR is to address the failure caused by x86 forward jump offset exceeding imm8 displacement when running the HotSpot jtreg test `test/hotspot/jtreg/compiler/c2/TestLWLockingCodeGen.java` using Intel APX (on SDE emulator). > > This bug triggers an assertion failure in methodHandles_x86.cpp because the assembler emits a short forward jump (imm8 displacement) whose target is more than 127 bytes away, exceeding the allowed range. This appears to be caused by larger stub code size when APX instruction encoding is enabled. > > The fix for this issue is to replace the `jccb` instruction with` jcc` in methodHandles_x86.cpp. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: change jccb to jcc in line 157 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26731/files - new: https://git.openjdk.org/jdk/pull/26731/files/02e1bfd6..ea8643c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26731&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26731&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26731.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26731/head:pull/26731 PR: https://git.openjdk.org/jdk/pull/26731 From sparasa at openjdk.org Wed Aug 13 00:54:53 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 13 Aug 2025 00:54:53 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX [v2] In-Reply-To: References: <6KBUzFUMEtIKXUhDGaNYEGtXmnSe7Ohu6ZTTmuH07NI=.e3d665d4-73d8-41c5-95a2-5e1e284eeb3a@github.com> Message-ID: <_jFWu0KyH4XhSX-PWdA_UCI0eVJ_e_OYKJE2G11rZUA=.9d343cb6-24de-40cf-82b0-11a4ae402ff6@github.com> On Tue, 12 Aug 2025 10:06:58 GMT, Andrew Haley wrote: >alignment with that tactics. Maybe you want to unshorten the branch at L157 as well. Thank you Aleksey (@shipilev)! As suggested, I updated the code to unshorten the branch at line 157. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26731#issuecomment-3181832309 From dzhang at openjdk.org Wed Aug 13 01:07:33 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 13 Aug 2025 01:07:33 GMT Subject: RFR: 8365302: RISC-V: compiler/loopopts/superword/TestAlignVector.java fails when vlen=128 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 06:38:33 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass when vlen=256, but fail when vlen=128. > > The error occurs because the test13aIL and test13bIL cases require ensuring that vectors are larger than what unrolling produces; otherwise, the corresponding vector IR will not be generated. > > We can use `JTREG="JAVA_OPTIONS=-XX:+TraceSuperWordLoopUnrollAnalysis"` during testing. > The tips in the log: > > 76844 1333 b 4 compiler.loopopts.superword.TestAlignVector::test13aIL (42 bytes) > slp analysis fails: unroll limit greater than max vector > > slp analysis: set max unroll to 4 > > > Therefore, we need to limit MaxVectorSize to greater than or equal to 32 bytes. > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/TestAlignVector.java on qemu-system with RVV when vlen=128/256 Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26738#issuecomment-3181863276 From duke at openjdk.org Wed Aug 13 01:10:12 2025 From: duke at openjdk.org (duke) Date: Wed, 13 Aug 2025 01:10:12 GMT Subject: RFR: 8365302: RISC-V: compiler/loopopts/superword/TestAlignVector.java fails when vlen=128 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 06:38:33 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass when vlen=256, but fail when vlen=128. > > The error occurs because the test13aIL and test13bIL cases require ensuring that vectors are larger than what unrolling produces; otherwise, the corresponding vector IR will not be generated. > > We can use `JTREG="JAVA_OPTIONS=-XX:+TraceSuperWordLoopUnrollAnalysis"` during testing. > The tips in the log: > > 76844 1333 b 4 compiler.loopopts.superword.TestAlignVector::test13aIL (42 bytes) > slp analysis fails: unroll limit greater than max vector > > slp analysis: set max unroll to 4 > > > Therefore, we need to limit MaxVectorSize to greater than or equal to 32 bytes. > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/TestAlignVector.java on qemu-system with RVV when vlen=128/256 @DingliZhang Your change (at version 5bcd85bf94a43bda930c5c813076f6c04972c37c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26738#issuecomment-3181867476 From dzhang at openjdk.org Wed Aug 13 01:28:22 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 13 Aug 2025 01:28:22 GMT Subject: Integrated: 8365302: RISC-V: compiler/loopopts/superword/TestAlignVector.java fails when vlen=128 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 06:38:33 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > [JDK-8352529](https://bugs.openjdk.org/browse/JDK-8352529) enables this IR verification test for riscv. This test pass when vlen=256, but fail when vlen=128. > > The error occurs because the test13aIL and test13bIL cases require ensuring that vectors are larger than what unrolling produces; otherwise, the corresponding vector IR will not be generated. > > We can use `JTREG="JAVA_OPTIONS=-XX:+TraceSuperWordLoopUnrollAnalysis"` during testing. > The tips in the log: > > 76844 1333 b 4 compiler.loopopts.superword.TestAlignVector::test13aIL (42 bytes) > slp analysis fails: unroll limit greater than max vector > > slp analysis: set max unroll to 4 > > > Therefore, we need to limit MaxVectorSize to greater than or equal to 32 bytes. > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/TestAlignVector.java on qemu-system with RVV when vlen=128/256 This pull request has now been integrated. Changeset: 636c61a3 Author: Dingli Zhang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/636c61a3868d9c01b672b3b45cda1e476acdc045 Stats: 18 lines in 1 file changed: 16 ins; 0 del; 2 mod 8365302: RISC-V: compiler/loopopts/superword/TestAlignVector.java fails when vlen=128 Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/26738 From jbhateja at openjdk.org Wed Aug 13 03:11:00 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 13 Aug 2025 03:11:00 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v6] In-Reply-To: References: Message-ID: <-_yIOwHApwxDw0YIWJ7MnXqK2VknHMQYoGShNqaslRk=.26037fd7-5429-4f41-a829-14f485b0ff48@github.com> > Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. > It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. > > Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). > > Vector API jtreg tests pass at AVX level 2, remaining validation in progress. > > Performance numbers: > > > System : 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms > VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms > VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms > VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms > VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms > VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Cleanups, review resoultions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24104/files - new: https://git.openjdk.org/jdk/pull/24104/files/f36ae6dd..d55fa594 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=04-05 Stats: 25 lines in 7 files changed: 0 ins; 24 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104 PR: https://git.openjdk.org/jdk/pull/24104 From jbhateja at openjdk.org Wed Aug 13 03:11:00 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 13 Aug 2025 03:11:00 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v2] In-Reply-To: References: <1Vs8Ud-yh7FtFJN9sddNXDVM6Mc0ue9oi_oa0w5pRzU=.022172f3-1622-4d05-888b-c7afc66a5254@github.com> Message-ID: On Tue, 12 Aug 2025 05:58:47 GMT, Xiaohong Gong wrote: >>> Q1: Is it possible that just passing `origin->get_con()` to `VectorSliceNode` in case there are architectures that need it directly? Or, maybe we'd better add comment telling that the origin passed to `VectorSliceNode` is adjust to bytes. >>> >> >> Added comments. >> >>> Q2: If `origin` is not a constant, and there is an architecture that support the index as a variable, will the code crash here? Can we just limit the `origin` to a constant for this intrinsifaction in this PR? We can consider to extend it to variable in case any architecture has such a requirement. WDYT? >> >> Currently, inline expander only supports constant origin. I have added a check to fail intrinsification and inline fallback using the hybrid call generator. > > Thanks for your updating! So maybe the matcher function `supports_vector_slice_with_non_constant_index()` could also be removed totally? Yes, idea here is just to intrinsify a perticular scenario where slice index is a constant value and not burden the inline expander with full-blown intrinsification of all possible control paths without impacting the performance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2271958737 From jbhateja at openjdk.org Wed Aug 13 03:20:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 13 Aug 2025 03:20:02 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v7] In-Reply-To: References: Message-ID: > Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. > It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. > > Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). > > Vector API jtreg tests pass at AVX level 2, remaining validation in progress. > > Performance numbers: > > > System : 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms > VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms > VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms > VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms > VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms > VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 ... Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24104/files - new: https://git.openjdk.org/jdk/pull/24104/files/d55fa594..70c22932 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=05-06 Stats: 6 lines in 6 files changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104 PR: https://git.openjdk.org/jdk/pull/24104 From shade at openjdk.org Wed Aug 13 06:28:11 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 06:28:11 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX [v2] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 00:54:53 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to address the failure caused by x86 forward jump offset exceeding imm8 displacement when running the HotSpot jtreg test `test/hotspot/jtreg/compiler/c2/TestLWLockingCodeGen.java` using Intel APX (on SDE emulator). >> >> This bug triggers an assertion failure in methodHandles_x86.cpp because the assembler emits a short forward jump (imm8 displacement) whose target is more than 127 bytes away, exceeding the allowed range. This appears to be caused by larger stub code size when APX instruction encoding is enabled. >> >> The fix for this issue is to replace the `jccb` instruction with` jcc` in methodHandles_x86.cpp. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > change jccb to jcc in line 157 Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26731#pullrequestreview-3114102855 From mhaessig at openjdk.org Wed Aug 13 06:54:14 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 13 Aug 2025 06:54:14 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v5] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 17:13:58 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request incrementally with eight additional commits since the last revision: >> >> - Better documentation of profitable() >> - Remove vector sizes >> - Specify vector sizes >> - Merge branch 'jdk-8356176-byte-size' of github.com:mhaessig/jdk into jdk-8356176-byte-size >> - Add asserts >> - Make region a field >> - Even more better debug print >> - Remove redundant scenarios > > src/hotspot/share/opto/loopnode.hpp line 1686: > >> 1684: // split. This is needed when we split a node and then must also split a >> 1685: // dependant node, i.e. spliting a Bool node after splitting a Cmp node. >> 1686: policy < 0; > > It seems to me that `policy < 0` actually implies `_loop_back_wins > policy`, because `policy < 0 <= _loop_back_wins`. > > Maybe it is still better to keep it, so we are explicit. I was confused about what negative valued policies do. That's why I kept it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2272249702 From thartmann at openjdk.org Wed Aug 13 06:59:17 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 13 Aug 2025 06:59:17 GMT Subject: RFR: 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" [v5] In-Reply-To: <70bF6nyeg21mKc4SxXn9QulJPjMikmxUUcG08smx7hk=.1815618d-e5a7-4d50-af63-7a93dfd01fe8@github.com> References: <70bF6nyeg21mKc4SxXn9QulJPjMikmxUUcG08smx7hk=.1815618d-e5a7-4d50-af63-7a93dfd01fe8@github.com> Message-ID: On Mon, 11 Aug 2025 00:49:45 GMT, Guanqiang Han wrote: >> I'm able to consistently reproduce the problem using the following command line and test program ? >> >> java -Xcomp -XX:TieredStopAtLevel=1 -XX:C1MaxInlineSize=200 Test.java >> >> import java.util.Arrays; >> public class Test{ >> public static void main(String[] args) { >> System.out.println("begin"); >> byte[] arr1 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; >> byte[] arr2 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; >> System.out.println(Arrays.equals(arr1, arr2)); >> System.out.println("end"); >> } >> } >> >> From my analysis, the root cause appears to be a mismatch in operand handling between T_ADDRESS and T_LONG in LIR_Assembler::stack2reg, especially when the source is marked as double stack (e.g., T_LONG) and the destination as single CPU register (e.g., T_ADDRESS), leading to assertion failures like assert(is_single_stack())(because T_LONG is double_size). >> >> In the test program above , the call chain is: Arrays.equals ? ArraysSupport.vectorizedMismatch ? LIRGenerator::do_vectorizedMismatch >> Within the do_vectorizedMismatch() method, a move instruction constructs an LIR_Op1. During LIR to machine code generation, LIR_Assembler::stack2reg was called. >> >> In this case, the src operand has type T_LONG and the dst operand has type T_ADDRESS. This combination triggers an assert in stack2reg, due to a mismatch between the stack slot type and register type handling. >> >> Importantly, this path ( LIR_Assembler::stack2reg was called ) is only taken when src is forced onto the stack. To reliably trigger this condition, the test is run with the -Xcomp option to force compilation and increase register pressure. >> >> A reference to the relevant code paths is provided below : >> image1 >> image2 >> >> On 64-bit platforms, although T_ADDRESS is classified as single_size, it is in fact 64 bits wide ,represent a single 64-bit general-purpose register and it can hold a T_LONG value, which is also 64 bits. >> >> However, T_LONG is defined as double_size, requiring two local variable slots or a pair of registers in the JVM's abstract model. This mismatch stems from the fact that T_ADDRESS is platform-dependent: it's 32 bits on 32-bit platforms, and 64 bits on 64-bit platforms ? yet its size class... > > Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - restrict compilation to the single method > - Merge remote-tracking branch 'upstream/master' into 8359235 > - change T_LONG to T_ADDRESS in some intrinsic functions > - Merge remote-tracking branch 'upstream/master' into 8359235 > - Increase sleep time to ensure the method gets compiled > - add regression test > - Merge remote-tracking branch 'upstream/master' into 8359235 > - 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" That looks good to me, assuming that you verified that the test still triggers the issue. Thanks for working on this and for your patience! :) ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26462#pullrequestreview-3114179884 From jbhateja at openjdk.org Wed Aug 13 06:59:18 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 13 Aug 2025 06:59:18 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX [v2] In-Reply-To: References: Message-ID: <8XnptjRBSufr7ctrmKXnVedx2OoWBb4nRu01NN3sev8=.0988e6d9-ea5e-45c8-9e4f-68cdee773c20@github.com> On Wed, 13 Aug 2025 00:54:53 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to address the failure caused by x86 forward jump offset exceeding imm8 displacement when running the HotSpot jtreg test `test/hotspot/jtreg/compiler/c2/TestLWLockingCodeGen.java` using Intel APX (on SDE emulator). >> >> This bug triggers an assertion failure in methodHandles_x86.cpp because the assembler emits a short forward jump (imm8 displacement) whose target is more than 127 bytes away, exceeding the allowed range. This appears to be caused by larger stub code size when APX instruction encoding is enabled. >> >> The fix for this issue is to replace the `jccb` instruction with` jcc` in methodHandles_x86.cpp. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > change jccb to jcc in line 157 LGTM. Hi @vamsi-parasa , thanks for fixing this :-), Some additional tid-bit :-), assembler tries to optimize jcc with jccb for bounded label backward branches, a branch label is bound if it's a backward jump. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26731#pullrequestreview-3114180560 From mhaessig at openjdk.org Wed Aug 13 07:02:58 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 13 Aug 2025 07:02:58 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v6] In-Reply-To: References: Message-ID: > A loop of the form > > MemorySegment ms = {}; > for (long i = 0; i < ms.byteSize() / 8L; i++) { > // vectorizable work > } > > does not vectorize, whereas > > MemorySegment ms = {}; > long size = ms.byteSize(); > for (long i = 0; i < size / 8L; i++) { > // vectorizable work > } > > vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: > > https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 > > Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. > > So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. > > @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem > > ## Change Description > > Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. > >
Explored Alternatives > 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. > 2. Insert a "`PHASEIDEALLOOP0`" with `LoopOptsNone` that only perfor... Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - Merge branch 'master' into jdk-8356176-byte-size - Emanuel's suggestion - Better documentation of profitable() - Remove vector sizes - Specify vector sizes - Merge branch 'jdk-8356176-byte-size' of github.com:mhaessig/jdk into jdk-8356176-byte-size - Update field documentation Co-authored-by: Emanuel Peter - Add asserts - Make region a field - Even more better debug print - ... and 13 more: https://git.openjdk.org/jdk/compare/25480f00...025dbe6e ------------- Changes: https://git.openjdk.org/jdk/pull/26429/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=05 Stats: 251 lines in 7 files changed: 210 ins; 18 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/26429.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26429/head:pull/26429 PR: https://git.openjdk.org/jdk/pull/26429 From yzheng at openjdk.org Wed Aug 13 07:04:12 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 13 Aug 2025 07:04:12 GMT Subject: RFR: 8365218: [JVMCI] AArch64 CPU features are not computed correctly after 8364128 [v4] In-Reply-To: References: Message-ID: > https://github.com/openjdk/jdk/pull/26515 changes the `VM_Version::CPU_` constant values on AArch64 and Graal now sees unsupported CPU features. This may result in SIGILL due to Graal emitting unsupported instructions, such as `CPU_SHA3`-based eor3 instructions in AArch64 SHA3 stubs. Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge master - style - style - address comments - [JVMCI] AArch64 CPU features are not computed correctly after 8364128 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26727/files - new: https://git.openjdk.org/jdk/pull/26727/files/6aa51798..0056275f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26727&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26727&range=02-03 Stats: 3509 lines in 92 files changed: 1362 ins; 1559 del; 588 mod Patch: https://git.openjdk.org/jdk/pull/26727.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26727/head:pull/26727 PR: https://git.openjdk.org/jdk/pull/26727 From jbhateja at openjdk.org Wed Aug 13 07:08:13 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 13 Aug 2025 07:08:13 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v2] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 03:11:04 GMT, Xiaohong Gong wrote: > I remember that it has the micro benchmarks for slice/unslice under `test/micro/org/openjdk/bench/jdk/incubator/vector/operation` on panama-vector. Can we reuse those JMHs to check the benchmark improvement? All those are the ones with variable slice index , slice kernel performance of those benchmarks on AVX2 and AVX512 targets are at par with baseline, and deviations are statistically insignificant due to error margins. New benchmark complements the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2272272548 From chagedorn at openjdk.org Wed Aug 13 07:08:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Aug 2025 07:08:23 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v2] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 03:19:22 GMT, Francisco Ferrari Bihurriet wrote: >> Hi, this pull request is a second take of 1383fec41756322bf2832c55633e46395b937b40, by updating the `CmpUNode` type as either `TypeInt::CC_LE` (case 1a) or `TypeInt::CC_LT` (case 1b) instead of updating the `BoolNode` type as `TypeInt::ONE`. >> >> With this approach a56cd371a2c497e4323756f8b8a08a0bba059bf2 becomes unnecessary. Additionally, having the right type in `CmpUNode` could potentially enable further optimizations. >> >> #### Testing >> >> In order to evaluate the changes, the following testing has been performed: >> >> * `jdk:tier1` (see [GitHub Actions run](https://github.com/franferrax/jdk/actions/runs/16789994433)) >> * [`TestBoolNodeGVN.java`](https://github.com/openjdk/jdk/blob/jdk-26+9/test/hotspot/jtreg/compiler/c2/gvn/TestBoolNodeGVN.java), created for [JDK-8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value](https://bugs.openjdk.org/browse/JDK-8327381) (1383fec41756322bf2832c55633e46395b937b40) >> * I also checked it breaks if I remove the `CmpUNode::Value_cmpu_and_mask` call >> * Private reproducer for [JDK-8349584: Improve compiler processing](https://bugs.openjdk.org/browse/JDK-8349584) (a56cd371a2c497e4323756f8b8a08a0bba059bf2) >> * A local slowdebug run of the `test/hotspot/jtreg/compiler/c2` category on _Fedora Linux x86_64_ >> * Same results as with `master` (f95af744b07a9ec87e2507b3d584cbcddc827bbd) > > Francisco Ferrari Bihurriet has updated the pull request incrementally with one additional commit since the last revision: > > Apply code review suggestions and add JBS to test Update looks good, thanks! I'll give run some testing and report back again. > Could you already find some examples, where this change gives us an improved IR? If so, you could also add it as IR test. Just double-checking, were you able to find such a test which now improves the IR with the better type info and `CmpU` while we could not with the old code? Otherwise, you could also file a follow-up RFE. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26666#pullrequestreview-3114193238 From chagedorn at openjdk.org Wed Aug 13 07:08:24 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Aug 2025 07:08:24 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v2] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 03:09:16 GMT, Francisco Ferrari Bihurriet wrote: >> src/hotspot/share/opto/phaseX.cpp line 2941: >> >>> 2939: // Bool >>> 2940: // >>> 2941: void PhaseCCP::push_bool_with_cmpu_and_mask(Unique_Node_List& worklist, const Node* use) const { >> >> Needed to double-check but I think it's fine to remove the notification code since we already have `push_cmpu()` in place which looks through the `AddI`: >> https://github.com/openjdk/jdk/blob/10762d408bba9ce0945100847a8674e7eb7fa75e/src/hotspot/share/opto/phaseX.cpp#L2911-L2926 >> >> So, whenever `m` or `1` changes, we will re-add the `CmpU` to the CCP worklist with `push_cmpu()`. The `x` does not matter for the application of `Value_cmpu_and_mask()`. > > Hmm, I was oversimplifying the problem, my way of thinking it was the following one: > > > m x m 1 > \ / \ / > AndI AddI grandparents > \ / > CmpU parent > | > Bool grandchild > > > _"As we were updating a grandchild based on its grandparents, we needed an ad-hoc worklist push for the grandchild. Since we now update the type of `CmpU` based on its parents, the canonical parent-to-child propagations should work, and we don't need any ad-hoc grandparents-to-grandchild worklist push anymore."_ > > But as you noted, non-immediate `CmpU` inputs such as `m` or `1` can change and should affect the `CmpU` type. Luckily, this already was the case for previous `CmpU` optimizations. > > --- > > For case **1a**, we don't need `PhaseCCP::push_cmpu` because `m` is also an immediate input of `CmpU`. > > > m x > \ / > AndI m > \ / > CmpU > | > Bool > > > --- > > I'm now realizing this was a very lucky situation. The `AndI` input isn't problematic even when `PhaseCCP::push_cmpu` doesn't handle the `use_op == Op_AndI` case, because: > > * `x` does not affect the application of `Value_cmpu_and_mask()` > * In case **1a**, `m` is a direct input of `CmpU` > * In case **1b**, the `AddI` input is handled in `PhaseCCP::push_cmpu` (`use_op == Op_AddI`) > > Please let me know if you think we should add a comment in the code. That's a good summary! Thanks for double-checking again. It's indeed only for **1b** a probably that's handled by `push_cmpu()`. It probably would not hurt to add a comment that `push_cmpu` handles this case, just to be sure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26666#discussion_r2272269609 From xgong at openjdk.org Wed Aug 13 07:08:13 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 13 Aug 2025 07:08:13 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v2] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 07:02:34 GMT, Jatin Bhateja wrote: >> test/micro/org/openjdk/bench/jdk/incubator/vector/VectorSliceBenchmark.java line 36: >> >>> 34: @State(Scope.Thread) >>> 35: @Fork(jvmArgs = {"--add-modules=jdk.incubator.vector"}) >>> 36: public class VectorSliceBenchmark { >> >> I remember that it has the micro benchmarks for slice/unslice under `test/micro/org/openjdk/bench/jdk/incubator/vector/operation` on panama-vector. Can we reuse those JMHs to check the benchmark improvement? > >> I remember that it has the micro benchmarks for slice/unslice under `test/micro/org/openjdk/bench/jdk/incubator/vector/operation` on panama-vector. Can we reuse those JMHs to check the benchmark improvement? > > All those are the ones with variable slice index , slice kernel performance of those benchmarks on AVX2 and AVX512 targets are at par with baseline, and deviations are statistically insignificant due to error margins. > > New benchmark complements the code. OK. Make sense to me. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2272278442 From duke at openjdk.org Wed Aug 13 07:09:15 2025 From: duke at openjdk.org (duke) Date: Wed, 13 Aug 2025 07:09:15 GMT Subject: RFR: 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" [v5] In-Reply-To: <70bF6nyeg21mKc4SxXn9QulJPjMikmxUUcG08smx7hk=.1815618d-e5a7-4d50-af63-7a93dfd01fe8@github.com> References: <70bF6nyeg21mKc4SxXn9QulJPjMikmxUUcG08smx7hk=.1815618d-e5a7-4d50-af63-7a93dfd01fe8@github.com> Message-ID: On Mon, 11 Aug 2025 00:49:45 GMT, Guanqiang Han wrote: >> I'm able to consistently reproduce the problem using the following command line and test program ? >> >> java -Xcomp -XX:TieredStopAtLevel=1 -XX:C1MaxInlineSize=200 Test.java >> >> import java.util.Arrays; >> public class Test{ >> public static void main(String[] args) { >> System.out.println("begin"); >> byte[] arr1 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; >> byte[] arr2 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; >> System.out.println(Arrays.equals(arr1, arr2)); >> System.out.println("end"); >> } >> } >> >> From my analysis, the root cause appears to be a mismatch in operand handling between T_ADDRESS and T_LONG in LIR_Assembler::stack2reg, especially when the source is marked as double stack (e.g., T_LONG) and the destination as single CPU register (e.g., T_ADDRESS), leading to assertion failures like assert(is_single_stack())(because T_LONG is double_size). >> >> In the test program above , the call chain is: Arrays.equals ? ArraysSupport.vectorizedMismatch ? LIRGenerator::do_vectorizedMismatch >> Within the do_vectorizedMismatch() method, a move instruction constructs an LIR_Op1. During LIR to machine code generation, LIR_Assembler::stack2reg was called. >> >> In this case, the src operand has type T_LONG and the dst operand has type T_ADDRESS. This combination triggers an assert in stack2reg, due to a mismatch between the stack slot type and register type handling. >> >> Importantly, this path ( LIR_Assembler::stack2reg was called ) is only taken when src is forced onto the stack. To reliably trigger this condition, the test is run with the -Xcomp option to force compilation and increase register pressure. >> >> A reference to the relevant code paths is provided below : >> image1 >> image2 >> >> On 64-bit platforms, although T_ADDRESS is classified as single_size, it is in fact 64 bits wide ,represent a single 64-bit general-purpose register and it can hold a T_LONG value, which is also 64 bits. >> >> However, T_LONG is defined as double_size, requiring two local variable slots or a pair of registers in the JVM's abstract model. This mismatch stems from the fact that T_ADDRESS is platform-dependent: it's 32 bits on 32-bit platforms, and 64 bits on 64-bit platforms ? yet its size class... > > Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - restrict compilation to the single method > - Merge remote-tracking branch 'upstream/master' into 8359235 > - change T_LONG to T_ADDRESS in some intrinsic functions > - Merge remote-tracking branch 'upstream/master' into 8359235 > - Increase sleep time to ensure the method gets compiled > - add regression test > - Merge remote-tracking branch 'upstream/master' into 8359235 > - 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" @hgqxjj Your change (at version 4e084ec4de32f217b72ebc073479925721f9efae) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26462#issuecomment-3182441043 From ghan at openjdk.org Wed Aug 13 07:09:16 2025 From: ghan at openjdk.org (Guanqiang Han) Date: Wed, 13 Aug 2025 07:09:16 GMT Subject: RFR: 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" In-Reply-To: References: Message-ID: On Fri, 25 Jul 2025 14:45:03 GMT, Tobias Hartmann wrote: >> I'm able to consistently reproduce the problem using the following command line and test program ? >> >> java -Xcomp -XX:TieredStopAtLevel=1 -XX:C1MaxInlineSize=200 Test.java >> >> import java.util.Arrays; >> public class Test{ >> public static void main(String[] args) { >> System.out.println("begin"); >> byte[] arr1 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; >> byte[] arr2 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; >> System.out.println(Arrays.equals(arr1, arr2)); >> System.out.println("end"); >> } >> } >> >> From my analysis, the root cause appears to be a mismatch in operand handling between T_ADDRESS and T_LONG in LIR_Assembler::stack2reg, especially when the source is marked as double stack (e.g., T_LONG) and the destination as single CPU register (e.g., T_ADDRESS), leading to assertion failures like assert(is_single_stack())(because T_LONG is double_size). >> >> In the test program above , the call chain is: Arrays.equals ? ArraysSupport.vectorizedMismatch ? LIRGenerator::do_vectorizedMismatch >> Within the do_vectorizedMismatch() method, a move instruction constructs an LIR_Op1. During LIR to machine code generation, LIR_Assembler::stack2reg was called. >> >> In this case, the src operand has type T_LONG and the dst operand has type T_ADDRESS. This combination triggers an assert in stack2reg, due to a mismatch between the stack slot type and register type handling. >> >> Importantly, this path ( LIR_Assembler::stack2reg was called ) is only taken when src is forced onto the stack. To reliably trigger this condition, the test is run with the -Xcomp option to force compilation and increase register pressure. >> >> A reference to the relevant code paths is provided below : >> image1 >> image2 >> >> On 64-bit platforms, although T_ADDRESS is classified as single_size, it is in fact 64 bits wide ,represent a single 64-bit general-purpose register and it can hold a T_LONG value, which is also 64 bits. >> >> However, T_LONG is defined as double_size, requiring two local variable slots or a pair of registers in the JVM's abstract model. This mismatch stems from the fact that T_ADDRESS is platform-dependent: it's 32 bits on 32-bit platforms, and 64 bits on 64-bit platforms ? yet its size class... > > +1 to what Dean suggested. I think other intrinsics are affected by this as well though, for example: > https://github.com/openjdk/jdk/blob/b1fa1ecc988fb07f191892a459625c2c8f2de3b5/src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp#L953-L962 > > Also, what about other platforms than x86? @TobiHartmann Thanks for the review! I?ve tested it several times locally and confirmed the issue is still reproducible with the test. I?ve already integrated this patch ? could you please sponsor it? Thanks again! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26462#issuecomment-3182444392 From dlong at openjdk.org Wed Aug 13 07:27:07 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 13 Aug 2025 07:27:07 GMT Subject: RFR: 8278874: tighten VerifyStack constraints [v11] In-Reply-To: References: Message-ID: > The VerifyStack logic in Deoptimization::unpack_frames() attempts to check the expression stack size of the interpreter frame against what GenerateOopMap computes. To do this, it needs to know if the state at the current bci represents the "before" state, meaning the bytecode will be reexecuted, or the "after" state, meaning we will advance to the next bytecode. The old code didn't know how to determine exactly what state we were in, so it checked both. This PR cleans that up, so we only have to compute the oopmap once. It also removes old SPARC support. Dean Long has updated the pull request incrementally with two additional commits since the last revision: - more fixes and cleanup/refactoring - Graal fix and more cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26121/files - new: https://git.openjdk.org/jdk/pull/26121/files/e04fc720..4dab21bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26121&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26121&range=09-10 Stats: 102 lines in 3 files changed: 49 ins; 36 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/26121.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26121/head:pull/26121 PR: https://git.openjdk.org/jdk/pull/26121 From dlong at openjdk.org Wed Aug 13 07:29:20 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 13 Aug 2025 07:29:20 GMT Subject: RFR: 8278874: tighten VerifyStack constraints [v10] In-Reply-To: References: Message-ID: <2PN-fohhq9t7gY7y374y47r4pz7xHbFJwdosL1wkDFA=.954d0300-1cf2-4b0e-abcf-40c5ce0ca953@github.com> On Wed, 6 Aug 2025 01:29:58 GMT, Dean Long wrote: >> The VerifyStack logic in Deoptimization::unpack_frames() attempts to check the expression stack size of the interpreter frame against what GenerateOopMap computes. To do this, it needs to know if the state at the current bci represents the "before" state, meaning the bytecode will be reexecuted, or the "after" state, meaning we will advance to the next bytecode. The old code didn't know how to determine exactly what state we were in, so it checked both. This PR cleans that up, so we only have to compute the oopmap once. It also removes old SPARC support. > > Dean Long has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'openjdk:master' into 8278874-verifystack > - Merge branch 'openjdk:master' into 8278874-verifystack > - more cleanup > - simplify is_top_frame > - readability suggestion > - reviewer suggestions > - Update src/hotspot/share/runtime/vframeArray.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/runtime/vframeArray.cpp > > Co-authored-by: Manuel H?ssig > - better name for frame index > - Update src/hotspot/share/runtime/deoptimization.cpp > > Co-authored-by: Manuel H?ssig > - ... and 3 more: https://git.openjdk.org/jdk/compare/2ead4528...e04fc720 Hopefully these last couple commits fix the remaining edge cases, but Graal testing is still running. I'm testing with VerifyStack hard-coded to true so that vm.flagless tests also run, which does cause some failures due to timeouts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26121#issuecomment-3182509262 From fyang at openjdk.org Wed Aug 13 07:45:22 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 13 Aug 2025 07:45:22 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v24] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 10:24:29 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > removed reservations for unused vector registers per reviewer's comment; added sanity assertion. Thanks for the update. Having another look. Some comments along the way. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2006: > 2004: const int num_8bit_elems_in_vec_reg = MaxVectorSize; > 2005: // Let's use T_INT as all hashCode calculations eventually deal with ints. > 2006: const int ints_in_vec_reg = num_8bit_elems_in_vec_reg/sizeof(jint); Suggestion: `const int ints_in_vec_reg = MaxVectorSize / sizeof(jint);` We can remove `num_8bit_elems_in_vec_reg` then. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2014: > 2012: > 2013: switch (eltype) { > 2014: case T_BOOLEAN: BLOCK_COMMENT("arrays_hashcode_v(unsigned byte) {"); break; Please leave two spaces for each case. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2028: > 2026: > 2027: const VectorRegister v_sum = v2; > 2028: Seems no need to have this new line here. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2032: > 2030: const VectorRegister v_coeffs = v6; > 2031: const VectorRegister v_tmp = v8; > 2032: const VectorRegister v_zred = v_tmp; Seems we don't really need this alias `v_zred`. We can use `v_tmp` instead. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2042: > 2040: > 2041: andi(t1, cnt, MAX_VEC_MASK); > 2042: beqz(t1, SCALAR_TAIL); Suggestion: Use `t0` instead of `t1` to hold some short lived values. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2044: > 2042: beqz(t1, SCALAR_TAIL); > 2043: > 2044: vsetvli(t0, x0, Assembler::e32, Assembler::m2); It will be safer to use `t1` instead of `t0` here to hold the number of elements processed for each round. `t0` as a scratch register gets clobbered by various assembler routines more frequently than `t1`. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2061: > 2059: andi(t1, cnt, MAX_VEC_MASK); > 2060: mulw(result, result, pow31_highest); > 2061: bne(t1, x0, VEC_LOOP); Suggestion: `bnez(t1, VEC_LOOP);` src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2110: > 2108: } > 2109: > 2110: void C2_MacroAssembler::arrays_hashcode_vec_elload(VectorRegister varr, Can you rename this as `C2_MacroAssembler::arrays_hashcode_elload_v`? I think that will be more consistent in naming with other places. And I personally perfer `vdst` to `varr` the first param. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.hpp line 102: > 100: void arrays_hashcode_v(Register ary, Register cnt, Register result, > 101: Register tmp1, Register tmp2, Register tmp3, > 102: BasicType eltype); Please put a new line here. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.hpp line 106: > 104: int arrays_hashcode_elsize(BasicType eltype); > 105: void arrays_hashcode_elload(Register dst, Address src, BasicType eltype); > 106: void arrays_hashcode_vec_elload(VectorRegister varr, VectorRegister vtmp, Register array, BasicType eltype); Similar here. Consider renaming to `arrays_hashcode_elload_v`. src/hotspot/cpu/riscv/riscv_v.ad line 4092: > 4090: match(Set result (VectorizedHashCode (Binary ary cnt) (Binary result basic_type))); > 4091: effect(USE_KILL ary, USE_KILL cnt, USE basic_type, > 4092: TEMP v2, TEMP v3, TEMP v4, TEMP v5, TEMP v6, TEMP v7, TEMP v8, TEMP v9, Please keep only one space at RHS of each `TEMP`. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6587: > 6585: assert(UseRVV, "sanity"); > 6586: const int num_8bit_elems_in_vec_reg = MaxVectorSize; > 6587: const int ints_in_vec_reg = num_8bit_elems_in_vec_reg/sizeof(jint); Suggestion: `const int ints_in_vec_reg = MaxVectorSize / sizeof(jint);` We can remove num_8bit_elems_in_vec_reg then. ------------- PR Review: https://git.openjdk.org/jdk/pull/17413#pullrequestreview-3113812897 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2272292124 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2272218374 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2272345113 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2272234458 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2272274301 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2272270365 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2272287930 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2271970183 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2272227887 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2272340129 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2272222013 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2272293705 From epeter at openjdk.org Wed Aug 13 07:48:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 13 Aug 2025 07:48:21 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: <0rNuFLFwXcWfF0-nQQEd9fbIrziHos8PZJ93sDPFObo=.0587492e-267b-4681-8fb8-605cdc20f1c3@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <0rNuFLFwXcWfF0-nQQEd9fbIrziHos8PZJ93sDPFObo=.0587492e-267b-4681-8fb8-605cdc20f1c3@github.com> Message-ID: <1er7eUbDAxVtvHBrLGrLc7Mbxrd2L1nr3z0vB4zHerQ=.12e5641b-7870-483c-abac-51abfa42761f@github.com> On Tue, 12 Aug 2025 16:19:10 GMT, Manuel H?ssig wrote: >> I don't think that `make_last` makes any assumptions about `iv_scale1 < iv_scale2`. >> But I could consider moving it earlier anyway. Do you think that is worth it? > > I would do it because the proof states that if `iv_scale2 < iv_scale1` we swap them. It would keep it consistent. Also, you won't have to swap the spans. I moved it up now :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2272375818 From shade at openjdk.org Wed Aug 13 08:36:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 08:36:53 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v2] In-Reply-To: References: Message-ID: > When recording adapter entries, we record _offsets_, not the actual addresses: > > > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > > Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". > > This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. > > The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` still works > - [x] Linux ARM32 server fastdebug, `java -version` now works Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Offset -1 means nullptr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26746/files - new: https://git.openjdk.org/jdk/pull/26746/files/5a994a75..9ad2cfb0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26746&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26746&range=00-01 Stats: 13 lines in 2 files changed: 8 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/26746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26746/head:pull/26746 PR: https://git.openjdk.org/jdk/pull/26746 From shade at openjdk.org Wed Aug 13 08:36:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 08:36:53 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: <_JTzV_2psYbRGZdla7NyTIg-qyxW-13CnqC33hTcJro=.637184a3-b37a-4891-a00d-082c70d05245@github.com> On Tue, 12 Aug 2025 14:06:35 GMT, Aleksey Shipilev wrote: > When recording adapter entries, we record _offsets_, not the actual addresses: > > > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > > Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". > > This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. > > The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` still works > - [x] Linux ARM32 server fastdebug, `java -version` now works Update: Passing `-1` as offset and cleanly reconstituting it as `nullptr` looks not that intrusive and feels less awkward. Done in new commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3182746244 From shade at openjdk.org Wed Aug 13 08:41:32 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 08:41:32 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: > When recording adapter entries, we record _offsets_, not the actual addresses: > > > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > > Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". > > This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. > > The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` still works > - [x] Linux ARM32 server fastdebug, `java -version` now works Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Polish comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26746/files - new: https://git.openjdk.org/jdk/pull/26746/files/9ad2cfb0..7c33cc5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26746&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26746&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26746/head:pull/26746 PR: https://git.openjdk.org/jdk/pull/26746 From chagedorn at openjdk.org Wed Aug 13 08:43:17 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Aug 2025 08:43:17 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v5] In-Reply-To: <6gq4iIBw4RIqqPvmAf2MHnKrmYHwOdWdH1fz1bFaCGA=.57906956-460f-4a1d-9e3e-fbf91a7974e2@github.com> References: <6gq4iIBw4RIqqPvmAf2MHnKrmYHwOdWdH1fz1bFaCGA=.57906956-460f-4a1d-9e3e-fbf91a7974e2@github.com> Message-ID: <7ZSL2sR91qOFup-zauB0VKCoLYB9dHMn3GGwLmo-gEk=.e790e543-fb41-42cd-add2-1f5f4a141afb@github.com> On Fri, 8 Aug 2025 10:51:42 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. >> >> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. >> >> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. >> >> Testing: >> - [x] Github Actions >> - [x] tier1, tier2 on all platforms >> - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug >> - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8308094-timeout > - Rename _timer > - remove _timeout_armed > - ASSERT > - Merge branch 'master' into JDK-8308094-timeout > - No acquire release semantics > - Factor Linux specific timeout functionality out of share/ > - Move timeout disarm above if > - Merge branch 'master' into JDK-8308094-timeout > - Fix SIGALRM test > - ... and 1 more: https://git.openjdk.org/jdk/compare/c5920029...8bb5eb7a Nice improvement! I left some small comments in the code but otherwise the change looks reasonable! Can we also add some tests for the new `CompileTaskTimeout` flag? Maybe we can add a positive test and negative test: - Positive test: Could just be a hello world test with a reasonably large non-zero value for `CompileTaskTimeout`. - Negative test: Maybe we can just set `CompileTaskTimeout=1` which will probably crash immediately for a hello world program. That could be run in a separate VM and then we can check the output. If we are able to also dump the compile task/method that is timing out, we might even be able to match on that when run with `CompileOnly` for a single method. But not sure if the latter is possible. What do you think? src/hotspot/os/linux/compilerThreadTimeout_linux.cpp line 43: > 41: switch (signo) { > 42: case TIMEOUT_SIGNAL: { > 43: assert(false, "compile task timed out"); Can you somehow also print the task which caused the timeout? Will just accessing `CompilerThread::current()->task()` work? src/hotspot/os/linux/compilerThreadTimeout_linux.hpp line 46: > 44: #endif // !PRODUCT > 45: public: > 46: CompilerThreadTimeoutLinux() NOT_PRODUCT(DEBUG_ONLY(: _timer(nullptr))) {}; Why do you need the `NOT_PRODUCT`? It only wraps `DEBUG_ONLY`. If that's not set, the `NOT_PRODUCT` wraps nothing. src/hotspot/os/linux/globals_linux.hpp line 94: > 92: develop(intx, CompileTaskTimeout, 0, \ > 93: "Set the timeout for compile tasks' CPU time in milliseconds."\ > 94: "0 = no timeout (default)") \ Suggestion: " 0 = no timeout (default)") \ src/hotspot/share/compiler/compilerThread.hpp line 52: > 50: CompilerThreadTimeoutGeneric() {}; > 51: void arm() { return; }; > 52: void disarm() { return; }; You can remove the `return`: Suggestion: void arm() {}; void disarm() {}; src/hotspot/share/compiler/compilerThread.hpp line 54: > 52: void disarm() { return; }; > 53: bool init_timeout() { return true; }; > 54: }; Should we also guard this with `ifndef LINUX` since it's only used for non-Linux? ------------- PR Review: https://git.openjdk.org/jdk/pull/26023#pullrequestreview-3114494124 PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272483803 PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272509382 PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272512593 PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272517713 PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272516811 From chagedorn at openjdk.org Wed Aug 13 08:59:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Aug 2025 08:59:11 GMT Subject: RFR: 8362530: VM crash with -XX:+PrintTieredEvents when collecting AOT profiling In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 18:02:16 GMT, Igor Veresov wrote: > When printing tiered events we take the ttyLock and also now the trainingDataLock. While benign it's best to decouple these. The solution is to gather the output bits in a buffer and then print it. Do you also have a regression test for the crash that you could add or add the print flag to some existing test to verify your change? I have two more small comments but otherwise, it looks good to me, too! src/hotspot/share/compiler/compilationPolicy.cpp line 407: > 405: } > 406: > 407: void CompilationPolicy::print_counters_on(outputStream* st, const char* prefix, Method* m) { Suggestion: void CompilationPolicy::print_counters_on(outputStream* st, const char* prefix, Method* m) { src/hotspot/share/compiler/compilationPolicy.cpp line 552: > 550: print_event_on(&s, type, m, im, bci, level); > 551: ResourceMark rm; > 552: ttyLocker tty_lock; Do you really need the lock with only one `print()`? I thought it should be safe in that case. ------------- PR Review: https://git.openjdk.org/jdk/pull/26750#pullrequestreview-3114605784 PR Review Comment: https://git.openjdk.org/jdk/pull/26750#discussion_r2272563957 PR Review Comment: https://git.openjdk.org/jdk/pull/26750#discussion_r2272571774 From adinn at openjdk.org Wed Aug 13 09:02:12 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 13 Aug 2025 09:02:12 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: <_JTzV_2psYbRGZdla7NyTIg-qyxW-13CnqC33hTcJro=.637184a3-b37a-4891-a00d-082c70d05245@github.com> References: <_JTzV_2psYbRGZdla7NyTIg-qyxW-13CnqC33hTcJro=.637184a3-b37a-4891-a00d-082c70d05245@github.com> Message-ID: On Wed, 13 Aug 2025 08:31:13 GMT, Aleksey Shipilev wrote: > Update: Passing -1 as offset and cleanly reconstituting it as nullptr looks not that intrusive and feels less awkward. Done in new commit. I was about to suggest that as one of two possible alternatives. These offsets are only needed for AOT save and restore. The problem of a selective implementation of the entries in a multi-entry blob mirrors a problem I have been thinking over where a multi-entry (Stubgen) stub implementation may not omit entries in some ports (likewise, may declare a SEGV protection range whose associated handler is not an entry in the stub). In both these cases we need to be able to pass nullptr in as the entry address and get nullptr back out when we retrieve the address. So, we need to translate the address to a sentinel offset value which we save intot he archive andrecognise when we retrieve it. There are two easy choices, -1 the obvious one ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3182845604 From adinn at openjdk.org Wed Aug 13 09:12:20 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 13 Aug 2025 09:12:20 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 08:41:32 GMT, Aleksey Shipilev wrote: >> When recording adapter entries, we record _offsets_, not the actual addresses: >> >> >> entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; >> >> >> Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". >> >> This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. >> >> The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `runtime/cds` still works >> - [x] Linux ARM32 server fastdebug, `java -version` now works > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Polish comment Good! ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26746#pullrequestreview-3114708002 From aoqi at openjdk.org Wed Aug 13 09:26:17 2025 From: aoqi at openjdk.org (Ao Qi) Date: Wed, 13 Aug 2025 09:26:17 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 08:41:32 GMT, Aleksey Shipilev wrote: >> When recording adapter entries, we record _offsets_, not the actual addresses: >> >> >> entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; >> >> >> Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". >> >> This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. >> >> The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `runtime/cds` still works >> - [x] Linux ARM32 server fastdebug, `java -version` now works > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Polish comment src/hotspot/share/runtime/sharedRuntime.cpp line 2850: > 2848: address i2c_entry = handler->get_i2c_entry(); > 2849: entry_offset[0] = 0; // i2c_entry offset > 2850: entry_offset[1] = (handler->get_c2i_entry() != nullptr) ? Should we handle the Zero case? Something like: entry_offset[1] = (handler->get_c2i_entry() != nullptr ZERO_ONLY(&& false)) ? (handler->get_c2i_entry() - i2c_entry) : -1; entry_offset[2] = (handler->get_c2i_unverified_entry() != nullptr ZERO_ONLY(&& false)) ? (handler->get_c2i_unverified_entry() - i2c_entry) : -1; With Zero, `handler->get_i2c_entry()`, `handler->get_c2i_entry()` and `handler->get_c2i_unverified_entry()` are same and not nullptr. The build of Zero can trigger the problem. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2272674799 From snatarajan at openjdk.org Wed Aug 13 09:35:08 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 13 Aug 2025 09:35:08 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v8] In-Reply-To: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: > **Issue** > Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. > > **Analysis** > On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. > > **Proposal** > Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. > > **Issue in AArch64** > Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. > > **Question to reviewers** > Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? > > **Request to port maintainers** > @dafedafe suggested that we keep the upper bound of `BciProfileWidth` to 1000 pro... Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: additions for linux-riscv64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26139/files - new: https://git.openjdk.org/jdk/pull/26139/files/60da70c6..2f511dbf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26139&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26139&range=06-07 Stats: 7 lines in 1 file changed: 0 ins; 3 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26139/head:pull/26139 PR: https://git.openjdk.org/jdk/pull/26139 From dnsimon at openjdk.org Wed Aug 13 09:42:10 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 13 Aug 2025 09:42:10 GMT Subject: RFR: 8365218: [JVMCI] AArch64 CPU features are not computed correctly after 8364128 [v4] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 07:04:12 GMT, Yudi Zheng wrote: >> https://github.com/openjdk/jdk/pull/26515 changes the `VM_Version::CPU_` constant values on AArch64 and Graal now sees unsupported CPU features. This may result in SIGILL due to Graal emitting unsupported instructions, such as `CPU_SHA3`-based eor3 instructions in AArch64 SHA3 stubs. > > Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge master > - style > - style > - address comments > - [JVMCI] AArch64 CPU features are not computed correctly after 8364128 Still good. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26727#pullrequestreview-3114873479 From snatarajan at openjdk.org Wed Aug 13 09:43:13 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 13 Aug 2025 09:43:13 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v6] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: On Tue, 12 Aug 2025 08:12:11 GMT, Fei Yang wrote: >> In the process of adding port maintainers to this PR, I by mistake added (and removed) some of them as contributor. I will update the contributor list before closing the PR. Sorry for the inconvenience > > @sarannat : Hi, Thanks for the ping! > I just tried the newly-added test on linux-riscv64 and I think we still need some extra change for this platform. > Do you mind adding that in this PR? I see the test pass with this addon change when running with fastdebug build. > [riscv-addon-fix.diff.txt](https://github.com/user-attachments/files/21729910/riscv-addon-fix.diff.txt) Thank you @RealFYang. I have added the changes suggested by you. Could you review it looks good ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3183013341 From snatarajan at openjdk.org Wed Aug 13 09:48:18 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 13 Aug 2025 09:48:18 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v7] In-Reply-To: <3gxI6LHS5BbGD3Dra6pdCPsRosCAO6W_rhUAQK0qAlA=.b75b1b5c-7e8d-4fbf-9b0d-83fdc424881e@github.com> References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> <3gxI6LHS5BbGD3Dra6pdCPsRosCAO6W_rhUAQK0qAlA=.b75b1b5c-7e8d-4fbf-9b0d-83fdc424881e@github.com> Message-ID: On Tue, 12 Aug 2025 19:04:24 GMT, Boris Ulasevich wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> addressing review : adding vm.debug and moving a defn > > Marked as reviewed by bulasevich (Committer). Thank you @bulasevich ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3183036976 From snatarajan at openjdk.org Wed Aug 13 09:48:19 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 13 Aug 2025 09:48:19 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v6] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> <1QbX5WHkEdjP-unAFJ1vYaoIc9bV8zz8dA-vKZCkYn8=.8e3704ae-9490-4471-9e5c-dae44004d46f@github.com> Message-ID: On Tue, 12 Aug 2025 15:16:40 GMT, Damon Fenacci wrote: >> Thank you for the review. >> I have now included `@requires vm.debug` > > Shouldn't we check that the vm doesn't crash with `BciProfileWidth=-1` and `BciProfileWidth=100000` (or another very high value)? @dafedafe : I am working on this and will upload the changes soon. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26139#discussion_r2272747902 From mablakatov at openjdk.org Wed Aug 13 09:49:19 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Wed, 13 Aug 2025 09:49:19 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v8] In-Reply-To: References: Message-ID: <072sgUJQa-oI9-uylhiPMzk2wLEr9e_8MZE1joM3fxs=.c0b4df04-57cb-43a4-b42b-340102013524@github.com> On Sat, 9 Aug 2025 07:29:26 GMT, Andrew Haley wrote: > Please try to organize things the same way as the Decode section of the ARM. Do you refer to *C4: A64 Instruction Set Encoding*? > Insert a new section called SVE Integer Misc - Unpredicated after SVE bitwise shift by immediate (predicated) and put this pattern there. I assume you might have misinterpreted **predicated** SVE bitwise shift for **unpredicated**. In the *C4: A64 Instruction Set Encoding*, *C4.1.41 SVE Integer Misc - Unpredicated* follows *C4.1.40 SVE Bitwise Shift - Unpredicated* which is not implemented by `src/hotspot/cpu/aarch64/assembler_aarch64.hpp` as far as I can tell. Suggested *SVE bitwise shift by immediate (predicated)* falls into *C4.1.34 SVE Bitwise Shift - Predicated*. If this change is to follow the ordering in *C4: A64 Instruction Set Encoding*, the next proceeding implemented instruction class for `sve_movprfx` (from *C4.1.41*) should be [SVE stack frame adjustment](https://github.com/openjdk/jdk/pull/23181/files/4593a5d717024df01769625993c2b769d8dde311#diff-203c5bbfa5307b5cc529c80acf90e764260db018ed658b949421f91190c56982L3686) which falls into *C4.1.38 SVE Stack Allocation*. The next following implemented instruction class should be [SVE element count](https://github.com/openjdk/jdk/pull/23181/files/4593a5d717024df01769625993c2b769d8dde311#diff-203c5bbfa5307b5cc529c80acf90e764260db018ed658b949421f91190c5698 2L4067) (inconveniently named something else in the source file) which falls into *C4.1.42 SVE Element Count*. The two instruction classes doesn't follow each other in the file, unfortunately, so it's one or the other. Currently it's the latter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2272750216 From bkilambi at openjdk.org Wed Aug 13 10:16:24 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 13 Aug 2025 10:16:24 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v5] In-Reply-To: References: Message-ID: > After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - > `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - > > > public void vectorAddConstInputFloat16() { > for (int i = 0; i < LEN; ++i) { > output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); > } > } > > > > > > The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. > > This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). > > Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Add an extra space in one of the comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26589/files - new: https://git.openjdk.org/jdk/pull/26589/files/bcecc6e1..f8dc132b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26589/head:pull/26589 PR: https://git.openjdk.org/jdk/pull/26589 From bkilambi at openjdk.org Wed Aug 13 10:23:16 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 13 Aug 2025 10:23:16 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 09:50:21 GMT, Andrew Haley wrote: > > HI @theRealAph Thanks a lot for your comment. I feel it is a good idea to modify `loadConH` to move a constant instead of doing an `ldr` from the constant pool (it could probably get us some performance benefit as well). However, the scope of this ticket was to mainly fix the JTREG errors that >16B SVE machines were running into due to illegal immediates being passed to the `sve_dup` instruction. Would it be acceptable if I push this fix first and then create a follow up task to work on optimizing `loadConH`? I can create a new JBS ticket and assign it to myself and tag it here as well if that helps. Thank you! > > Well, yes, but I'm proposing a simpler and better fix to the problem. Sure, if you want to do this in two steps go ahead. Apologies, I thought I could change just the replicate backend nodes to be able to generate the `mov` to a scratch reg -> `dup` to replicate the value but missed the point that I can't still get rid of the `loadConH` node that loads the immediate from the constant pool. If we want to change `loadConH` to instead generate a `mov` of an immediate to a scratch register, then we might have to change the `dst` from being a `vRegF` to a `iRegI` - `instruct loadConH(vRegF dst, immH con) %{` which I am not in favour of as the scalar FP16 operations still expect the value to be in an FPR. I will work on this on a follow up JBS ticket. Thanks. If this is acceptable, could I please get another round of review of the updated patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3183158689 From bkilambi at openjdk.org Wed Aug 13 10:23:18 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 13 Aug 2025 10:23:18 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 09:11:24 GMT, Aleksey Shipilev wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments and modified some comments > > test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 45: > >> 43: public class TestFloat16Replicate { >> 44: private static short[] input; >> 45: private static short[] output; > > This might give things even more chance to vectorize? Not sure, feel free to ignore. > > Suggestion: > > private static final short[] INPUTE; > private static final short[] OUTPUT; I hope it's ok to not add these changes to the code. The loops are getting vectorized fine and the tests do pass on aarch64 and x86. I will consider this if there's any issue with auto vectorization in the futurre. Thanks > test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 47: > >> 45: private static short[] output; >> 46: >> 47: // Choose FP16_IN_RANGE which is within the range of [-128 << 8, 127 << 8] and a multiple of 256 > > Suggestion: > > // Choose FP16_IN_RANGE which is within the range of [-128 << 8, 127 << 8] and a multiple of 256 Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2272848883 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2272845319 From shade at openjdk.org Wed Aug 13 10:43:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 10:43:12 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v5] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 10:16:24 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Add an extra space in one of the comments Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26589#pullrequestreview-3115162353 From shade at openjdk.org Wed Aug 13 10:51:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 10:51:12 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 09:23:16 GMT, Ao Qi wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Polish comment > > src/hotspot/share/runtime/sharedRuntime.cpp line 2850: > >> 2848: address i2c_entry = handler->get_i2c_entry(); >> 2849: entry_offset[0] = 0; // i2c_entry offset >> 2850: entry_offset[1] = (handler->get_c2i_entry() != nullptr) ? > > Should we handle the Zero case? Something like: > > entry_offset[1] = (handler->get_c2i_entry() != nullptr ZERO_ONLY(&& false)) ? > (handler->get_c2i_entry() - i2c_entry) : -1; > entry_offset[2] = (handler->get_c2i_unverified_entry() != nullptr ZERO_ONLY(&& false)) ? > (handler->get_c2i_unverified_entry() - i2c_entry) : -1; > > With Zero, `handler->get_i2c_entry()`, `handler->get_c2i_entry()` and `handler->get_c2i_unverified_entry()` are same and not nullptr. The build of Zero can trigger the problem. Ah, Zero is still broken, let me fix it... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2272937921 From ghan at openjdk.org Wed Aug 13 10:56:21 2025 From: ghan at openjdk.org (Guanqiang Han) Date: Wed, 13 Aug 2025 10:56:21 GMT Subject: Integrated: 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" In-Reply-To: References: Message-ID: On Thu, 24 Jul 2025 15:10:37 GMT, Guanqiang Han wrote: > I'm able to consistently reproduce the problem using the following command line and test program ? > > java -Xcomp -XX:TieredStopAtLevel=1 -XX:C1MaxInlineSize=200 Test.java > > import java.util.Arrays; > public class Test{ > public static void main(String[] args) { > System.out.println("begin"); > byte[] arr1 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; > byte[] arr2 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; > System.out.println(Arrays.equals(arr1, arr2)); > System.out.println("end"); > } > } > > From my analysis, the root cause appears to be a mismatch in operand handling between T_ADDRESS and T_LONG in LIR_Assembler::stack2reg, especially when the source is marked as double stack (e.g., T_LONG) and the destination as single CPU register (e.g., T_ADDRESS), leading to assertion failures like assert(is_single_stack())(because T_LONG is double_size). > > In the test program above , the call chain is: Arrays.equals ? ArraysSupport.vectorizedMismatch ? LIRGenerator::do_vectorizedMismatch > Within the do_vectorizedMismatch() method, a move instruction constructs an LIR_Op1. During LIR to machine code generation, LIR_Assembler::stack2reg was called. > > In this case, the src operand has type T_LONG and the dst operand has type T_ADDRESS. This combination triggers an assert in stack2reg, due to a mismatch between the stack slot type and register type handling. > > Importantly, this path ( LIR_Assembler::stack2reg was called ) is only taken when src is forced onto the stack. To reliably trigger this condition, the test is run with the -Xcomp option to force compilation and increase register pressure. > > A reference to the relevant code paths is provided below : > image1 > image2 > > On 64-bit platforms, although T_ADDRESS is classified as single_size, it is in fact 64 bits wide ,represent a single 64-bit general-purpose register and it can hold a T_LONG value, which is also 64 bits. > > However, T_LONG is defined as double_size, requiring two local variable slots or a pair of registers in the JVM's abstract model. This mismatch stems from the fact that T_ADDRESS is platform-dependent: it's 32 bits on 32-bit platforms, and 64 bits on 64-bit platforms ? yet its size classification remains single_size regardless. > > This classification... This pull request has now been integrated. Changeset: f3b34d32 Author: Guanqiang Han Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/f3b34d32d6ea409f8c8f0382e8f01e746366f842 Stats: 84 lines in 6 files changed: 76 ins; 0 del; 8 mod 8359235: C1 compilation fails with "assert(is_single_stack() && !is_virtual()) failed: type check" Reviewed-by: thartmann, dlong ------------- PR: https://git.openjdk.org/jdk/pull/26462 From shade at openjdk.org Wed Aug 13 11:16:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 11:16:59 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v4] In-Reply-To: References: Message-ID: > When recording adapter entries, we record _offsets_, not the actual addresses: > > > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > > Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". > > This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. > > The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` still works > - [x] Linux ARM32 server fastdebug, `java -version` now works Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Handling Zero crash as well ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26746/files - new: https://git.openjdk.org/jdk/pull/26746/files/7c33cc5e..aa62b27f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26746&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26746&range=02-03 Stats: 7 lines in 2 files changed: 2 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26746/head:pull/26746 PR: https://git.openjdk.org/jdk/pull/26746 From shade at openjdk.org Wed Aug 13 11:17:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 11:17:00 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 10:48:31 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/runtime/sharedRuntime.cpp line 2850: >> >>> 2848: address i2c_entry = handler->get_i2c_entry(); >>> 2849: entry_offset[0] = 0; // i2c_entry offset >>> 2850: entry_offset[1] = (handler->get_c2i_entry() != nullptr) ? >> >> Should we handle the Zero case? Something like: >> >> entry_offset[1] = (handler->get_c2i_entry() != nullptr ZERO_ONLY(&& false)) ? >> (handler->get_c2i_entry() - i2c_entry) : -1; >> entry_offset[2] = (handler->get_c2i_unverified_entry() != nullptr ZERO_ONLY(&& false)) ? >> (handler->get_c2i_unverified_entry() - i2c_entry) : -1; >> >> With Zero, `handler->get_i2c_entry()`, `handler->get_c2i_entry()` and `handler->get_c2i_unverified_entry()` are same and not nullptr. The build of Zero can trigger the problem. > > Ah, Zero is still broken, let me fix it... See new commit. Zero build now passes, I am running bootcycle-images now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2273027343 From duke at openjdk.org Wed Aug 13 12:02:58 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 13 Aug 2025 12:02:58 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v25] In-Reply-To: References: Message-ID: <-GNxf920ytSK-hakIM-KWRJ_N1yRHSaC-5oEoYTdPJg=.f7ec1a4a-f8ff-404f-a25b-77d996f4f20d@github.com> > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: - addressed reviewer's comments/suggestions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/44491863..aaf930be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=23-24 Stats: 55 lines in 4 files changed: 3 ins; 6 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Wed Aug 13 12:02:58 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 13 Aug 2025 12:02:58 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v22] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 06:05:15 GMT, Fei Yang wrote: >> Based on above experiments it looks reasonable to use `m2` grouping. > >> Based on above experiments it looks reasonable to use `m2` grouping. > > Thanks for the extra JMH numbers. Yes, I agree that `m2` is more reasonable here. > That means we won't need to reserve so many vector registers for `instruct varrays_hashcode` in src/hotspot/cpu/riscv/riscv_v.ad. > So can you free the unused vector registers? Will take a more closer look after that. Thanks a lot for your comments/suggestions, @RealFYang, fixed as suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3183537382 From snatarajan at openjdk.org Wed Aug 13 12:10:54 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 13 Aug 2025 12:10:54 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled [v4] In-Reply-To: References: Message-ID: > **Issue** > An error, `assert(data->is_ReceiverTypeData()) failed: bad profile data type`, is encountered during C2 compilation due to bad profile data. This occurs when the code is compiled with `TypeProfileCasts` option disabled. > > **Analysis** > The assertion failure occurs in `record_profiled_receiver_for_speculation` that analyzes the profiling information in the method data to determine whether a null value has been observed in the `instanceof` operation. This information is encoded in the `BitData` during profiling. When the method identifies that a null has been seen, it proceeds to inspect the associated `ReceiverTypeData` to see if the type check is always performed against null. However, in this scenario, the incoming profiling data is of type `BitData` rather than `ReceiverTypeData`, leading to the assertion failure. > > The profiling information for null seen for operations `aastore`, `instanceof`, and `checkcast` is recorded by the method `profile_null_seen `(in` src/hotspot/cpu/x86/templateTable_x86.cpp `). On investigating this method, it can be observed that the method data pointer is not updated for `VirtualCallData` (which is a subclass of `ReceiverTypeData`) when the `TypeProfileCasts` option is disabled. > > **Solution** > My proposal is to inspect the `ReceiverTypeData` in function `record_profiled_receiver_for_speculation` only if `TypeProfileCasts` is enabled (this is based on the fact that the relevant method data pointer is not updated when `TypeProfileCasts` is disabled). > > **Question to reviewers** > Do you think this is a reasonable fix ? > > **Testing** > GitHub Actions > tier1 to tier3 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: - formating code - add CompileThresholdScaling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26640/files - new: https://git.openjdk.org/jdk/pull/26640/files/31c645de..00d2e4ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26640&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26640&range=02-03 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26640.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26640/head:pull/26640 PR: https://git.openjdk.org/jdk/pull/26640 From snatarajan at openjdk.org Wed Aug 13 12:10:54 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 13 Aug 2025 12:10:54 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled [v2] In-Reply-To: References: <7BRWHyYaTAEbv7Yery2pRVrzCQfKB0sBFIh4M4xsCN8=.c0b2463a-4202-428d-bd3e-fa082cbbbf46@github.com> Message-ID: On Tue, 12 Aug 2025 16:28:35 GMT, Vladimir Kozlov wrote: >> Excellent, thank you! > > You can also use `-XX:CompileThresholdScaling=f` (specify `f` as 0.1, for example) flag to trigger compilation early to make sure 100 is enough. Thank you for the suggestion. The test works without this addition. However, I have added `-XX:CompileThresholdScaling=0.01` to be on safe side. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26640#discussion_r2273197661 From aph at openjdk.org Wed Aug 13 12:21:20 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 13 Aug 2025 12:21:20 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v8] In-Reply-To: <072sgUJQa-oI9-uylhiPMzk2wLEr9e_8MZE1joM3fxs=.c0b4df04-57cb-43a4-b42b-340102013524@github.com> References: <072sgUJQa-oI9-uylhiPMzk2wLEr9e_8MZE1joM3fxs=.c0b4df04-57cb-43a4-b42b-340102013524@github.com> Message-ID: <3bziwZ7rfKLirGwnVKQrl-j6-ENu5tktVmcXwZSxmSM=.a7b1b655-7c34-4e4e-b8a7-01db60ead3ad@github.com> On Wed, 13 Aug 2025 09:46:05 GMT, Mikhail Ablakatov wrote: > I assume you might have misinterpreted **predicated** SVE bitwise shift for **unpredicated**. It's possible. The point is to make sure that any new instruction is in a section corresponding to its section in hte Decoding tables. Please make your best guess as to where that should be, and we'll discuss it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2273264531 From fferrari at openjdk.org Wed Aug 13 12:24:16 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Wed, 13 Aug 2025 12:24:16 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v2] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 07:05:51 GMT, Christian Hagedorn wrote: >> Francisco Ferrari Bihurriet has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply code review suggestions and add JBS to test > > Update looks good, thanks! I'll run some testing and report back again. > >> Could you already find some examples, where this change gives us an improved IR? If so, you could also add it as IR test. > > Just double-checking, were you able to find such a test which now improves the IR with the better type info and `CmpU` while we could not with the old code? Otherwise, you could also file a follow-up RFE. @chhagedorn > > Could you already find some examples, where this change gives us an improved IR? If so, you could also add it as IR test. > > Just double-checking, were you able to find such a test which now improves the IR with the better type info and `CmpU` while we could not with the old code? Otherwise, you could also file a follow-up RFE. Sorry for not replying that, I'm working on it. We were explicitly matching the `BoolNode` tests, so let's explore the tests we were previously discarding. For **case 1a**, we were explicitly matching `BoolTest::le`, but now `CmpUNode` has `TypeInt::CC_LE` reflecting the fact that `m & x ?u m` is always true, so: | Test | Symbolic representation | Result | Improved IR | |:------------------:|:-----------------------:|:--------:|:-------------------------:| | `BoolTest::eq` | `m & x =u m` | unknown | no | | `BoolTest::ne` | `m & x ?u m` | unknown | no | | **`BoolTest::le`** | **`m & x ?u m`** | **true** | **no (old optimization)** | | `BoolTest::ge` | `m & x ?u m` | unknown | no | | `BoolTest::lt` | `m & x u m` | false | yes | For **case 1b**, we were explicitly matching `BoolTest::lt`, but now `CmpUNode` has `TypeInt::CC_LT` reflecting the fact that `m & x u m + 1` | false | yes | I will work on adding IR tests for these cases. Regarding real-world use cases, we need to rule out `BoolTest::lt`, as it didn't improve for _case 1a_ and was already optimized in old code for _case 1b_. I've found some possible candidates but haven't fully analyzed them yet: * [Array construction of `new byte[FastAllocateSizeLimit & x]`](https://github.com/openjdk/jdk/blob/jdk-26+9/src/hotspot/share/opto/graphKit.cpp#L3820-L3821) with _case 1a_ * [Switch jump tables](https://github.com/openjdk/jdk/blob/jdk-26+9/src/hotspot/share/opto/parse2.cpp#L829-L830) with _case 1b_ * 8 more code searches ([A](https://github.com/openjdk/jdk/blob/jdk-26+9/src/hotspot/share/opto/graphKit.cpp#L3939-L3940), [B](https://github.com/openjdk/jdk/blob/jdk-26+9/src/hotspot/share/opto/library_call.cpp#L3449-L3450), [C](https://github.com/openjdk/jdk/blob/jdk-26+9/src/hotspot/share/opto/library_call.cpp#L3869-L3870), [D](https://github.com/openjdk/jdk/blob/jdk-26+9/src/hotspot/share/opto/parse2.cpp#L736-L737), [E](https://github.com/openjdk/jdk/blob/jdk-26+9/src/hotspot/share/opto/parse2.cpp#L829-L830), [F](https://github.com/openjdk/jdk/blob/jdk-26+9/src/hotspot/share/opto/subnode.cpp#L1650-L1651), [G](https://github.com/openjdk/jdk/blob/jdk-26+9/src/hotspot/share/opto/subnode.cpp#L1677-L1678), [H](https://github.com/openjdk/jdk/blob/jdk-26+9/src/hotspot/share/opto/subnode.cpp#L1680-L1681)) * 3 more possible indirect matches ([I](https://github.com/openjdk/jdk/blob/jdk-26+9/src/hotspot/share/opto/loopopts.cpp#L2944-L2947), [J](https://github.com/openjdk/jdk/blob/jdk-26+9/src/hotspot/share/opto/subnode.cpp#L1596-L1597), [K](https://github.com/openjdk/jdk/blob/jdk-26+9/src/hotspot/share/opto/subnode.cpp#L1601-L1602)) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26666#issuecomment-3183659818 From aph at openjdk.org Wed Aug 13 12:30:15 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 13 Aug 2025 12:30:15 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 09:50:21 GMT, Andrew Haley wrote: >> For `loadConH`, LLVM and GCC use >> >> mov wscratch, #const >> dup v0.4h, wscratch >> >> We should investigate that. >> >> As far as I can see, LLVM and GCC do this for all vector immediates that don't need more than 2 movz/movk instructions. > >> HI @theRealAph Thanks a lot for your comment. I feel it is a good idea to modify `loadConH` to move a constant instead of doing an `ldr` from the constant pool (it could probably get us some performance benefit as well). However, the scope of this ticket was to mainly fix the JTREG errors that >16B SVE machines were running into due to illegal immediates being passed to the `sve_dup` instruction. Would it be acceptable if I push this fix first and then create a follow up task to work on optimizing `loadConH`? I can create a new JBS ticket and assign it to myself and tag it here as well if that helps. Thank you! > > Well, yes, but I'm proposing a simpler and better fix to the problem. Sure, if you want to do this in two steps go ahead. > > > HI @theRealAph Thanks a lot for your comment. I feel it is a good idea to modify `loadConH` to move a constant instead of doing an `ldr` from the constant pool (it could probably get us some performance benefit as well). However, the scope of this ticket was to mainly fix the JTREG errors that >16B SVE machines were running into due to illegal immediates being passed to the `sve_dup` instruction. Would it be acceptable if I push this fix first and then create a follow up task to work on optimizing `loadConH`? I can create a new JBS ticket and assign it to myself and tag it here as well if that helps. Thank you! > > > > > > Well, yes, but I'm proposing a simpler and better fix to the problem. Sure, if you want to do this in two steps go ahead. > > Apologies, I thought I could change just the replicate backend nodes to be able to generate the `mov` to a scratch reg -> `dup` to replicate the value but missed the point that I can't still get rid of the `loadConH` node that loads the immediate from the constant pool. Why not? > If we want to change `loadConH` to instead generate a `mov` of an immediate to a scratch register, then we might have to change the `dst` from being a `vRegF` to a `iRegI` I don't understand. Why not do something along these lines? // Replicate a 16-bit half precision float instruct replicateHF_imm8_gt128b(vReg dst, immHDupV con) %{ predicate(Matcher::vector_length_in_bytes(n) > 16); match(Set dst (Replicate con)); format %{ "replicateHF_imm8_gt128b $dst, $con\t# vector > 128 bits" %} ins_encode %{ assert(UseSVE > 0, "must be sve"); if (constant fits) { __ sve_dup($dst$$FloatRegister, __ H, (int)($con$$constant)); } else __ mov(rscratch1, (int)($con$$constant)); __ sve_dup($dst$$FloatRegister, __ H, rscratch1); } %} ins_pipe(pipe_slow); %} ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3183701776 From fgao at openjdk.org Wed Aug 13 12:38:50 2025 From: fgao at openjdk.org (Fei Gao) Date: Wed, 13 Aug 2025 12:38:50 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts Message-ID: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. The whole process is done by the function `insert_post_loop()`. We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: 1. The fall-in control flow to the vectorized drain loop comes from a `RegionNode` merging exits from pre-loop and main-loop, implemented in `insert_post_loop()`. 2. All fall-in values to the vectorized drain loop come from (one or more) `Phi`s merging exit values from pre-loop and main-loop, implemented by `get_vectorized_drain_input()`. 3. All control uses of exits from old-loop now should use new `RegionNode`s that merge `RegionNode`s which merge exits from pre-loop and main-loop and exits from the new-loop (vectorized drain loop) equivalents, implemented by `fix_ctrl_uses_for_vectorized_drain()`. 4. All data uses of values from old-loop now should use new `Phi`s that merge two inputs: - `Phi`s, which in their turn merge values from pre-loop and main-loop, - and values from the new-loop (vectorized drain loop) equivalents. This is implemented by `handle_data_uses_for_vectorized_drain()`. We also add a new micro-benchmark to test the performance gain. Here are the performance results from different vector-length machines. **Average Time on 128-bit machine?:** ![128-addb](https://github.com/user-attachments/assets/05fdf7b4-e4cd-4372-a400-f1ac5b4be328) ![128-adds](https://github.com/user-attachments/assets/c5c65277-fb4d-4d54-b443-7e44901618ff) ![128-addi](https://github.com/user-attachments/assets/34c5c296-2f1d-42dc-a0b9-ffe3b14b3d29) ![128-addl](https://github.com/user-attachments/assets/0559d9c1-77d7-4120-9638-ee187a71e485) **Average Time on 256-bit machine?:** ![256-addb](https://github.com/user-attachments/assets/2f88dedd-9371-4a75-bf6a-a6901b8eb689) ![256-adds](https://github.com/user-attachments/assets/36ce000e-517f-4a1f-b3ba-5f6708709426) ![256-addi](https://github.com/user-attachments/assets/cfb9e7f9-ae9b-4797-97c2-8451733c938f) ![256-addl](https://github.com/user-attachments/assets/bba0f61a-b401-4ccd-bdb2-7b4daee1c197) **Average Time on 512-bit machine?:** ![512-addb](https://github.com/user-attachments/assets/aedd1f9f-dc53-4dc5-9ba4-b8bbcff1265c) ![512-adds](https://github.com/user-attachments/assets/ed4ad9f6-5349-4f69-ae98-b4f87ccb3a85) ![512-addi](https://github.com/user-attachments/assets/b418304c-641c-4b29-9c0f-0df5e53ddc11) ![512-addl](https://github.com/user-attachments/assets/8008cb7a-36eb-4913-b205-ccfe8444c58c) Tier 1- 3 passed on aarch64 and x86. ------------- Commit messages: - Clean up comments for consistency and add spacing for readability - Fix some corner case failures and refined part of code - Merge branch 'master' into optimize-atomic-post - Refine ascii art, rename some variables and resolve conflicts - Merge branch 'master' into optimize-atomic-post - Add necessary ASCII art, refactor insert_post_loop() and rename - Merge branch 'master' into optimize-atomic-post - 8307084: C2: Vector atomic post loop is not executed for some small trip counts Changes: https://git.openjdk.org/jdk/pull/22629/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22629&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307084 Stats: 1542 lines in 8 files changed: 1358 ins; 59 del; 125 mod Patch: https://git.openjdk.org/jdk/pull/22629.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22629/head:pull/22629 PR: https://git.openjdk.org/jdk/pull/22629 From epeter at openjdk.org Wed Aug 13 12:38:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 13 Aug 2025 12:38:52 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Sat, 7 Dec 2024 09:16:29 GMT, Fei Gao wrote: > In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the > `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. > > Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. > > To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. > > The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. > > This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. > > The whole process is done by the function `insert_post_loop()`. > > We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: > > 1. The fall-in control flow to the vectorized drain loop comes from a `RegionNode` merging exits ... Hi @fg1417 ! Wow, the benchmark plots look amazing. I have some first questions. Mostly a request for some ascii art so that reading the code is easier. I'll have another look later! Thanks for the updates. I gave it a quick scan and proposed some changes. I can look at it again once you repond to these :) (we currently have lots of reviews, so I need to do a little round-robin here ? ) src/hotspot/share/opto/loopTransform.cpp line 1309: > 1307: // min-trip guard (main loop) > 1308: // / \ > 1309: // / IfTrue Personal preference (Nitpicky): I would like to see the IfTrue / IfFalse of the same if close together. You have the IfFalse go down to the RegionNode. src/hotspot/share/opto/loopTransform.cpp line 1322: > 1320: // RegionNode('merge_point') | / > 1321: // \ \ | / > 1322: // \ PhiNode('outn') I think the order of region inputs and Phi inputs is swapped (not consistent in the ASCII), right? src/hotspot/share/opto/loopTransform.cpp line 1343: > 1341: // min-trip guard (post loop) > 1342: // ... > 1343: A general point about naming: Is there a difference between `zero-trip` and `min-trip` guards? `back_ctrl, n, preheader_ctrl, zer_exit, cur_phi, outn, exit_point` don't tell me in themselves where they would be. I would prefer more explicit names, maybe they should say if they belong to `main` or `drain`? Suggestions (maybe incorrect, so feel free to improve): back_ctrl -> main_backedge_ctrl n -> main_incr merge_point -> main_merge_region outn -> main_merge_phi exit_point -> drain_merge_region new min-trip guard -> drain_zero_trip_guard preheader_ctrl -> drain_entry zer_exit -> drain_bypass cur_phi -> drain_phi Also: in the ASCII the ctlr loops go in one direction, and the data-flow in the other. I would do them both counter-clock-wise. It also seems to me that the pattern at the end looks basically symmetrical for the `main` and the `drain` loop, right? It would be nice if the naming and layout showed this (or highlights possible differences). src/hotspot/share/opto/loopTransform.cpp line 1628: > 1626: // has already informed us that more unrolling is about to happen to the main loop. > 1627: // The resultant post loop will serve as a vectorized drain loop. > 1628: void PhaseIdealLoop::insert_atomic_post_loop(IdealLoopTree *loop, Node_List &old_new) { A comment about naming: Which one do we think is the best? - vector post loop - atomic post loop - vectorized drain loop Honestly, I think the 3rd option "vectorized drain loop" would be the most descriptive, and I would propose that we try to only use that name. Maybe you have an even better idea. Suggestion: // Insert a copy of the atomic vectorized main loop as a post loop, policy_unroll // has already informed us that more unrolling is about to happen to the main loop. // The resultant post loop will serve as a vectorized drain loop. void PhaseIdealLoop::insert_atomic_post_loop(IdealLoopTree* loop, Node_List& old_new) { src/hotspot/share/opto/loopTransform.cpp line 1690: > 1688: > 1689: //------------------------------insert_atomic_post_loop_impl------------------------------- > 1690: // The main implementation of inserting atomic post loop after vector main loop. I would really appreciate some kind of ascii art here. It should show the pre, main, vectorized-post and post loop. And the relevant zero-trip guards, etc. src/hotspot/share/opto/loopTransform.cpp line 1716: > 1714: > 1715: // clone_loop() above changes the exit projection > 1716: main_exit = outer_main_end->proj_out(false); Looks like a lot of code duplication with `PhaseIdealLoop::insert_post_loop`. We maybe want to think how to refactor this. src/hotspot/share/opto/loopTransform.cpp line 1726: > 1724: Node* min_taken = main_head->skip_assertion_predicates_with_halt(); > 1725: IfNode* min_iff = min_taken->in(0)->as_If(); > 1726: assert(min_iff, "Minimun trip guard of main loop does exist."); Suggestion: assert(min_iff, "Minimum trip guard of main loop does exist."); src/hotspot/share/opto/loopTransform.cpp line 1735: > 1733: > 1734: //------------------------------insert_post_loop------------------------------- > 1735: // Insert a loop as the mode specified post the given loop passed. I don't understand this sentence - maybe I'm just tired ? src/hotspot/share/opto/loopTransform.cpp line 1736: > 1734: //------------------------------insert_post_loop------------------------------- > 1735: // Insert a loop as the mode specified post the given loop passed. > 1736: Suggestion: // We like to make the commenting continuous everywhere usually. src/hotspot/share/opto/loopTransform.cpp line 1802: > 1800: // / / > 1801: // / / > 1802: // after loop I love the symmetry ? src/hotspot/share/opto/loopopts.cpp line 2735: > 2733: Node* old = extra_data_nodes.at(i); > 2734: handle_data_uses_for_atomic_post_loop(old, old_new, loop, outer_loop, worklist, new_counter); > 2735: } Probably we will never add any new cases, but if so, it might be better to have the "default" case last. We might have a switch, or some if-elseif... src/hotspot/share/opto/loopopts.cpp line 2923: > 2921: IdealLoopTree* use_loop) { > 2922: // We need a Region to merge the exit from the cloned body(atomic post loop) > 2923: // and the merge point of exits from the vector main-loop and pre-loop. Again: ascii would be nice! src/hotspot/share/opto/predicates.hpp line 369: > 367: > 368: public: > 369: explicit CommonAssertionPredicate(IfTrueNode* success_proj) Can you explain this change? Also: you may have to change the description about the predicates at the top of this file. test/micro/org/openjdk/bench/vm/compiler/AtomicPostLoopPerf.java line 193: > 191: c[i] = a[i] + b[i]; > 192: } > 193: } I think these cases are covered by https://github.com/openjdk/jdk/pull/22070, don't you think? ------------- PR Review: https://git.openjdk.org/jdk/pull/22629#pullrequestreview-2491972453 PR Review: https://git.openjdk.org/jdk/pull/22629#pullrequestreview-2545911302 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1912781249 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1912777960 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1912793069 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1877900623 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878181315 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878177048 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878186041 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1912795110 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1912796029 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1912797021 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878201327 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878205564 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878209592 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878214406 From bulasevich at openjdk.org Wed Aug 13 12:38:59 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 13 Aug 2025 12:38:59 GMT Subject: RFR: 8365071: ARM32: JFR intrinsic jvm_commit triggers C2 regalloc assert Message-ID: <6MHwDW0E9bOzpj5B3pzlNmOCRPtFtnrk55NmTTxbhLM=.f0026c26-2c80-4766-8984-da9f34a31c8d@github.com> On 32-bit ARM, the jvm_commit JFR intrinsic builder feeds null (RegP) into a TypeLong Phi, causing mixed long/pointer register sizing and triggering the C2 register allocator assert(_num_regs == reg || !_num_regs). The fix is trivial: use an appropriate ConL constant instead. This has no effect on 64-bit systems (the generated assembly is identical) but resolves a JFR issue on 32-bit systems. ------------- Commit messages: - 8365071: ARM32: JFR intrinsic jvm_commit triggers C2 regalloc assert Changes: https://git.openjdk.org/jdk/pull/26684/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26684&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365071 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26684.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26684/head:pull/26684 PR: https://git.openjdk.org/jdk/pull/26684 From fgao at openjdk.org Wed Aug 13 12:38:52 2025 From: fgao at openjdk.org (Fei Gao) Date: Wed, 13 Aug 2025 12:38:52 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 10 Dec 2024 14:36:23 GMT, Emanuel Peter wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > Hi @fg1417 ! > > Wow, the benchmark plots look amazing. > > I have some first questions. Mostly a request for some ascii art so that reading the code is easier. I'll have another look later! Hi @eme64 , thanks for your review and comments! In the new commit, I added some ascii art to illustrate these new functions. Would you like to have a look? Thanks! ? > Thanks for the updates. I gave it a quick scan and proposed some changes. I can look at it again once you repond to these :) > (we currently have lots of reviews, so I need to do a little round-robin here ? ) Thanks for your review @eme64 ! I updated it with new commit to resolve these comments. I found a new test failure after rebasing to the lasted JDK (not sure if it's a duplicate with known fuzzer failures) and will fix in the next update. Thanks! > src/hotspot/share/opto/loopTransform.cpp line 1343: > >> 1341: // min-trip guard (post loop) >> 1342: // ... >> 1343: > > A general point about naming: > > Is there a difference between `zero-trip` and `min-trip` guards? > > `back_ctrl, n, preheader_ctrl, zer_exit, cur_phi, outn, exit_point` don't tell me in themselves where they would be. I would prefer more explicit names, maybe they should say if they belong to `main` or `drain`? > > Suggestions (maybe incorrect, so feel free to improve): > > back_ctrl -> main_backedge_ctrl > n -> main_incr > merge_point -> main_merge_region > outn -> main_merge_phi > exit_point -> drain_merge_region > new min-trip guard -> drain_zero_trip_guard > preheader_ctrl -> drain_entry > zer_exit -> drain_bypass > cur_phi -> drain_phi > > Also: in the ASCII the ctlr loops go in one direction, and the data-flow in the other. I would do them both counter-clock-wise. > > It also seems to me that the pattern at the end looks basically symmetrical for the `main` and the `drain` loop, right? It would be nice if the naming and layout showed this (or highlights possible differences). Thanks for your comments! I renamed all related variables with more meaningful ones except `exit_point`, which is shared by drain loop and post loop. But I marked it with `drain_merge_region` in the corresponding ASCII. > src/hotspot/share/opto/loopTransform.cpp line 1628: > >> 1626: // has already informed us that more unrolling is about to happen to the main loop. >> 1627: // The resultant post loop will serve as a vectorized drain loop. >> 1628: void PhaseIdealLoop::insert_atomic_post_loop(IdealLoopTree *loop, Node_List &old_new) { > > A comment about naming: > Which one do we think is the best? > - vector post loop > - atomic post loop > - vectorized drain loop > > Honestly, I think the 3rd option "vectorized drain loop" would be the most descriptive, and I would propose that we try to only use that name. Maybe you have an even better idea. > > Suggestion: > > // Insert a copy of the atomic vectorized main loop as a post loop, policy_unroll > // has already informed us that more unrolling is about to happen to the main loop. > // The resultant post loop will serve as a vectorized drain loop. > void PhaseIdealLoop::insert_atomic_post_loop(IdealLoopTree* loop, Node_List& old_new) { Yeah, `vectorized drain loop` does make sense to me. Honestly, my colleague asked me why it's called as "atomic post loop". It's not easy to explain the relationship with common `atomic`? > src/hotspot/share/opto/loopTransform.cpp line 1716: > >> 1714: >> 1715: // clone_loop() above changes the exit projection >> 1716: main_exit = outer_main_end->proj_out(false); > > Looks like a lot of code duplication with `PhaseIdealLoop::insert_post_loop`. We maybe want to think how to refactor this. Done. Thanks! > src/hotspot/share/opto/loopTransform.cpp line 1735: > >> 1733: >> 1734: //------------------------------insert_post_loop------------------------------- >> 1735: // Insert a loop as the mode specified post the given loop passed. > > I don't understand this sentence - maybe I'm just tired ? Updated. > src/hotspot/share/opto/loopopts.cpp line 2735: > >> 2733: Node* old = extra_data_nodes.at(i); >> 2734: handle_data_uses_for_atomic_post_loop(old, old_new, loop, outer_loop, worklist, new_counter); >> 2735: } > > Probably we will never add any new cases, but if so, it might be better to have the "default" case last. We might have a switch, or some if-elseif... Done. > test/micro/org/openjdk/bench/vm/compiler/AtomicPostLoopPerf.java line 193: > >> 191: c[i] = a[i] + b[i]; >> 192: } >> 193: } > > I think these cases are covered by https://github.com/openjdk/jdk/pull/22070, don't you think? I use large arrays to warm up these cases, see https://github.com/openjdk/jdk/blob/ca605406dd0119d162878194c942849a10f27c87/test/micro/org/openjdk/bench/vm/compiler/AtomicPostLoopPerf.java#L162-L167, as explained here https://bugs.openjdk.org/browse/JDK-8307084?focusedId=14729524&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14729524. Maybe that's the difference, what do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-2585892169 PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-2690978060 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1975621483 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878325446 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1912521464 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1975622140 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1912521774 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878376973 From qamai at openjdk.org Wed Aug 13 12:38:52 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 13 Aug 2025 12:38:52 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Sat, 7 Dec 2024 09:16:29 GMT, Fei Gao wrote: > In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the > `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. > > Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. > > To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. > > The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. > > This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. > > The whole process is done by the function `insert_post_loop()`. > > We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: > > 1. The fall-in control flow to the vectorized drain loop comes from a `RegionNode` merging exits ... Noob question: is it going to be easier if we create the loop structure like this instead: if (trip_cnt >= drain_inc) { if (trip_cnt >= main_inc) { main_loop; } drain_loop; } scalar_loop; I imagine it would be more straightforward because we go from this: scalar_loop into if (trip_cnt >= vector_inc) { vector_loop; } scalar_loop; And we will unroll the vector loop in the same manner. An additional benefit is that it makes loops with very few iterations more efficient, which in proportion would be more significant compared to reducing a branch from a huge main loop. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-2587089968 From fgao at openjdk.org Wed Aug 13 12:38:52 2025 From: fgao at openjdk.org (Fei Gao) Date: Wed, 13 Aug 2025 12:38:52 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Mon, 13 Jan 2025 13:20:59 GMT, Quan Anh Mai wrote: > Noob question: is it going to be easier if we create the loop structure like this instead: > > ``` > if (trip_cnt >= drain_inc) { > if (trip_cnt >= main_inc) { > main_loop; > } > drain_loop; > } > scalar_loop; > ``` > > I imagine it would be more straightforward because we go from this: > > ``` > scalar_loop > ``` > > into > > ``` > if (trip_cnt >= vector_inc) { > vector_loop; > } > scalar_loop; > ``` > > And we will unroll the vector loop in the same manner. An additional benefit is that it makes loops with very few iterations more efficient, which in proportion would be more significant compared to reducing a branch from a huge main loop. Hi @merykitty , thanks for your comments. Let's add some lines to make your proposed structure more complete: pre_loop; if (trip_cnt >= drain_inc) { if (trip_cnt >= main_inc) { main_loop; if (trip_cnt < drain_inc) { branch to scalar_post_loop; } } drain_loop; } scalar_post_loop; when we're considering how to implement this, we check that: 1. all fall-in values to `main_loop` only come from fall-out values of `pre_loop`. 2. all fall-in values to `drain_loop` may come from fall-out values of `pre_loop` or `main_loop`. 3. all fall-in values to `scalar_post_loop` may come from fall-out values of `pre_loop`,`main_loop` or `drain_loop`. The loop structure proposed by this pull request is: pre_loop if (trip_cnt >= main_inc) { main_loop } if (trip_cnt >= drain_incr) { drain_Loop } scalar_post_loop Both of these two structures have the same data flows as I listed above and control flows are quite similar. I'm afraid that all problems about data flows and control flows that this pull request fixes up are also needed to be fixed in your proposed structure. >From the side of C2 loop structure transformation, we go from: main_loop; (after loop) to main_loop; scalar_post_loop; (after loop) to pre_loop; if (trip_cnt >= main_inc) { main_loop; } scalar_post_loop; (after loop) When we're inserting a new loop, the code after the new loop always take fall-in values from both the new loop and the old loop. For example, when we're inserting the `scalar_post_loop`: 1. all fall-in values of `scalar_post_loop` come from `main_loop` only; 2. fall-in values of code `after loop` comes from `main_loop` or `scalar_post_loop`; Also for `pre_loop`, 1. all fall-in values of `main_loop` come from `pre_loop` only; 2. fall-in values of `scalar_post_loop` comes from `main_loop` or `pre_loop`; We have to insert in this order above, which is decided by the reused function `clone_loop()`. That's also why we get the existing structure: pre_loop; if (trip_cnt >= main_inc) { main_loop; if (trip_cnt >= drain_incr) { drain_Loop; } } scalar_post_loop; (after loop) Because when we're inserting `drain_loop`, on the existing code, we have: pre_loop exit. main_loop exit \ / scalar_post_loop to pre_loop exit main_loop exit drain_loop exit \ \ / \ merge_point \ / scalar_post_loop and **all fall-in values of `drain_loop` come from `main_loop` only**. But now we need: pre_loop exit main_loop exit drain_loop exit \ / / merge_point / \ / scalar_post_loop and **all fall-in values of `drain_loop` come from `main_loop` or `pre_loop`**. Well, if we want to reuse the existing logic of C2 loop structure transformation, to make things easier, we should insert loops based on `main_loop` in this **impossible** order: `scalar_post_loop -> drain_loop -> pre_loop`. In this way, I guess it wouldn't be easier to implement the loop structure you proposed. It may be even a little bit more complex, because it needs another `zero-trip guard` before `main_loop`. I agree it might make loops with very few iterations more efficient. We can consider it as another improvement. All above are based on my limited understanding. What do you think? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-2590067844 From epeter at openjdk.org Wed Aug 13 12:38:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 13 Aug 2025 12:38:53 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 10 Dec 2024 14:17:23 GMT, Emanuel Peter wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > src/hotspot/share/opto/loopTransform.cpp line 1690: > >> 1688: >> 1689: //------------------------------insert_atomic_post_loop_impl------------------------------- >> 1690: // The main implementation of inserting atomic post loop after vector main loop. > > I would really appreciate some kind of ascii art here. It should show the pre, main, vectorized-post and post loop. > And the relevant zero-trip guards, etc. And maybe you can also draw how the rewiring happens, i.e. where you cut/glue the graph back together. > src/hotspot/share/opto/predicates.hpp line 369: > >> 367: >> 368: public: >> 369: explicit CommonAssertionPredicate(IfTrueNode* success_proj) > > Can you explain this change? Also: you may have to change the description about the predicates at the top of this file. @chhagedorn might have some reservations about this, but I'll let him comment on his own. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878185472 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878266793 From epeter at openjdk.org Wed Aug 13 12:38:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 13 Aug 2025 12:38:53 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 10 Dec 2024 14:19:46 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopTransform.cpp line 1690: >> >>> 1688: >>> 1689: //------------------------------insert_atomic_post_loop_impl------------------------------- >>> 1690: // The main implementation of inserting atomic post loop after vector main loop. >> >> I would really appreciate some kind of ascii art here. It should show the pre, main, vectorized-post and post loop. >> And the relevant zero-trip guards, etc. > > And maybe you can also draw how the rewiring happens, i.e. where you cut/glue the graph back together. I think that will make reviewing much easier :) Like maybe trace the trip phi, and some other imaginary/memory data phi. The rewiring here looks tricky, and I'll have to spend some time looking at it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878197648 From chagedorn at openjdk.org Wed Aug 13 12:38:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Aug 2025 12:38:53 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 10 Dec 2024 15:00:52 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/predicates.hpp line 369: >> >>> 367: >>> 368: public: >>> 369: explicit CommonAssertionPredicate(IfTrueNode* success_proj) >> >> Can you explain this change? Also: you may have to change the description about the predicates at the top of this file. > > @chhagedorn might have some reservations about this, but I'll let him comment on his own. The Assertion Predicate code is changing at the moment. I've been waiting with some patches until the fork. There is more coming now. How urgent is the work of this patch? Otherwise, it might be easier to wait for the changes within the next weeks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878296999 From fgao at openjdk.org Wed Aug 13 12:38:53 2025 From: fgao at openjdk.org (Fei Gao) Date: Wed, 13 Aug 2025 12:38:53 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 10 Dec 2024 15:18:14 GMT, Christian Hagedorn wrote: >> @chhagedorn might have some reservations about this, but I'll let him comment on his own. > > The Assertion Predicate code is changing at the moment. I've been waiting with some patches until the fork. There is more coming now. How urgent is the work of this patch? Otherwise, it might be easier to wait for the changes within the next weeks. > Can you explain this change? Hi @eme64 , the core change involving `Predicate` lies in the function `void CreateAssertionPredicatesVisitor::visit(const TemplateAssertionPredicate& template_assertion_predicate)` and `void CreateAssertionPredicatesVisitor::visit(const InitializedAssertionPredicate& initialized_assertion_predicate)` in `src/hotspot/share/opto/predicates.cpp` line `895` and `905`. We need `rewire_loop_data_dependencies()` when inserting the vector drain loop both for `TemplateAssertionPredicate` and `InitializedAssertionPredicate`. All code to create a super class here is to make the change above cleaner. > How urgent is the work of this patch? Otherwise, it might be easier to wait for the changes within the next weeks. Thanks @chhagedorn . The patch is still WIP. Looking forward to your refactoring. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r1878354192 From bulasevich at openjdk.org Wed Aug 13 12:41:12 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 13 Aug 2025 12:41:12 GMT Subject: RFR: 8365071: ARM32: JFR intrinsic jvm_commit triggers C2 regalloc assert In-Reply-To: <6MHwDW0E9bOzpj5B3pzlNmOCRPtFtnrk55NmTTxbhLM=.f0026c26-2c80-4766-8984-da9f34a31c8d@github.com> References: <6MHwDW0E9bOzpj5B3pzlNmOCRPtFtnrk55NmTTxbhLM=.f0026c26-2c80-4766-8984-da9f34a31c8d@github.com> Message-ID: On Fri, 8 Aug 2025 01:54:46 GMT, Boris Ulasevich wrote: > On 32-bit ARM, the jvm_commit JFR intrinsic builder feeds null (RegP) into a TypeLong Phi, causing mixed long/pointer register sizing and triggering the C2 register allocator assert(_num_regs == reg || !_num_regs). > > The fix is trivial: use an appropriate ConL constant instead. This has no effect on 64-bit systems (the generated assembly is identical) but resolves a JFR issue on 32-bit systems. @mgronlun You might be interested in this ------------- PR Comment: https://git.openjdk.org/jdk/pull/26684#issuecomment-3183746220 From bkilambi at openjdk.org Wed Aug 13 12:55:15 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 13 Aug 2025 12:55:15 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 12:27:14 GMT, Andrew Haley wrote: > `Why not do something along these lines?` I tried exactly that and it does generate a `mov` and a 'dup' for illegal immediates which is why I initially said I would put up a patch soon but I realised later that the `loadConH` node is also being generated somewhere above (most likely because the value it loads is required for the scalar `AddHF` nodes). This isn't ideal? As we wanted to get rid of the load from the constant pool in the first place, if I got you right? > I don't understand. Apologies for not being clear. Another approach I thought was to directly modify the `loadConH` itself. `loadConH` is defined as - instruct loadConH(vRegF dst, immH con) %{ match(Set dst con); format %{ "ldrs $dst, [$constantaddress]\t# load from constant table: half float=$con\n\t" %} ins_encode %{ __ ldrs(as_FloatRegister($dst$$reg), $constantaddress($con)); %} ins_pipe(fp_load_constant_s); %} The destination register is an FPR. If we would want to modify this to generate a move to a scratch register instead (something similar to loadConI) then we would have to change the destination register to `iregI` which probably could be acceptable for autovectorization as we are replicating the value in a vector register anyway but for the scalar `AddHF` operation (the iterations that get peeled or the ones in pre/post loop which are not autovectorized), it would expect the value to be available in an FPR instead (the `h` register variant). So we might have to introduce a move from the GPR to an FPR. The reason why I felt I needed more time to investigate this. Please let me know your thoughts. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3183788498 From fyang at openjdk.org Wed Aug 13 12:57:20 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 13 Aug 2025 12:57:20 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v8] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: On Wed, 13 Aug 2025 09:35:08 GMT, Saranya Natarajan wrote: >> **Issue** >> Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. >> >> **Analysis** >> On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. >> >> **Proposal** >> Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. >> >> **Issue in AArch64** >> Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? >> >> **Request to port maintainers** >> @dafedafe suggested that we keep the upper boun... > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > additions for linux-riscv64 LGTM. Thanks for the update! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26139#pullrequestreview-3115891159 From aph at openjdk.org Wed Aug 13 13:04:15 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 13 Aug 2025 13:04:15 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 12:50:09 GMT, Bhavana Kilambi wrote: > The destination register is an FPR. If we would want to modify this to generate a move to a scratch register instead (something similar to loadConI) then we would have to change the destination register to `iregI` This is the part I don't understand. Why would you have to change the destination register to `iregI`? I wouldn't. instruct loadConH(vRegF dst, immH con) %{ match(Set dst con); format %{ "something" %} ins_encode %{ __ movw(rscratch1, $con$$constant); __ fmovs($dst$$reg, rscratch1); %} ``` ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3183831528 From adinn at openjdk.org Wed Aug 13 13:06:12 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 13 Aug 2025 13:06:12 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 11:13:48 GMT, Aleksey Shipilev wrote: >> Ah, Zero is still broken, let me fix it... > > See new commit. Zero build now passes, I am running bootcycle-images now. I don't believe this fix is correct and I'm not clear that it is even needed. Is something actually breaking with the zero code before the latest commit? The CPU-specific shared runtime code is called to fill in entry addresses for an adapter handler. That does not imply that any such handler is backed by an AdapterBlob. In particular on Zero there cannot be a corresponding AdapterBlob because we hav ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2273402345 From shade at openjdk.org Wed Aug 13 13:15:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 13:15:13 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 13:03:22 GMT, Andrew Dinn wrote: >> See new commit. Zero build now passes, I am running bootcycle-images now. > > I don't believe this fix is correct and I'm not clear that it is even needed. Is something actually breaking with the zero code before the latest commit? > > The CPU-specific shared runtime code is called to fill in entry addresses for an AdapterHandlerEntry. That does not imply that any such handler is backed by an AdapterBlob. In particular on Zero there cannot be a corresponding AdapterBlob because we have no runtime compiler capable fo generating one. There is one other case where an AdapterHandlerEntry has no corresponding AdapterBlob -- the abstract method handler's AdapterHandlerEntry is assembled usng several disparate methods that do not belong to a single generated blob. > > On CPUs where a blob is created the offsets of its secondary entries must be stored in the blob in order to allow the blob to be correctly saved to and restored from the AOT cache. At restore time we populate the entry array using he offsets and then update a newly created AdapterHandlerEntry using the blob start address and this aaray. That never happens on Zero. We never have to translate entries in a Zero AdapterHandlerEntry to offsets and we never have to translate stored offsets to entry addresses. > > So, I don't think there is any need to change the assignment of entry addresses in the Zero implementation of SharedRuntime::generate_i2c2i_adapters(). Unless there is something that has actually broken. Indeed, I would argue that the latest patch is not just needless but damaging as it actually removes a sanity check. The original code sets the first 3 entries to a dummy stub that catches an invalid use of the AdapterHandlerEntry. If we ony set the first entry then we have less protection against invalid calls. Yes. In Zero entries point to the same (fake, error-throwing) stub, mostly for diagnostics. Which _also_ means their offsets are all zero, which trips the assert. I think Zero can just skip setting the entries, so that we ride on the current code that treats unset entries well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2273429175 From shade at openjdk.org Wed Aug 13 13:19:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 13:19:14 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 13:12:25 GMT, Aleksey Shipilev wrote: >> I don't believe this fix is correct and I'm not clear that it is even needed. Is something actually breaking with the zero code before the latest commit? >> >> The CPU-specific shared runtime code is called to fill in entry addresses for an AdapterHandlerEntry. That does not imply that any such handler is backed by an AdapterBlob. In particular on Zero there cannot be a corresponding AdapterBlob because we have no runtime compiler capable of generating one. But this situation is not just specific to Zero. On all other architectures there is one other case where an AdapterHandlerEntry has no corresponding AdapterBlob -- the abstract method handler's AdapterHandlerEntry is assembled usng several disparate methods that do not belong to a single generated blob. >> >> On CPUs where a blob is created the offsets of its secondary entries must be stored in the blob in order to allow the blob to be correctly saved to and restored from the AOT cache. The offsets are saved when the blob is created and saved when the blob is written to the AOT cache. The associated entries are written into the corresponding AdapterHandlerEntry. At restore time we populate the entry array using the offsets foudn in the restored blob and then update a newly created AdapterHandlerEntry using the blob start address and this array. That never happens on Zero. We never have to translate entries in a Zero AdapterHandlerEntry to offsets and we never have to translate stored offsets to entry addresses. >> >> So, I don't think there is any need to change the assignment of entry addresses in the Zero implementation of SharedRuntime::generate_i2c2i_adapters(). Unless there is something that has actually broken. Indeed, I would argue that the latest patch is not just needless but damaging as it actually removes a sanity check. The original code sets the first 3 entries to a dummy stub that catches an invalid use of the AdapterHandlerEntry. If we ony set the first entry then we have less protection against invalid calls. > >> Is something actually breaking with the zero code before the latest commit? > > Yes. In Zero entries point to the same (fake, error-throwing) stub, mostly for diagnostics. Which _also_ means their offsets are all zero, which trips the assert. This readily fails even the simple `make images` on Zero. > > I think Zero can just skip setting the entries, so that we ride on the current code that treats unset entries well. The alternative is to modify the asserts to accept zero offsets for non-i2c entries, if you think that is better? It caters for Zero, but relaxes the condition for non-Zero code. So it is like choosing where you want to open the assert gap: for Zero, or for everything else? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2273443434 From shade at openjdk.org Wed Aug 13 13:31:18 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 13:31:18 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 13:17:02 GMT, Aleksey Shipilev wrote: >>> Is something actually breaking with the zero code before the latest commit? >> >> Yes. In Zero entries point to the same (fake, error-throwing) stub, mostly for diagnostics. Which _also_ means their offsets are all zero, which trips the assert. This readily fails even the simple `make images` on Zero. >> >> I think Zero can just skip setting the entries, so that we ride on the current code that treats unset entries well. > > The alternative is to modify the asserts to accept zero offsets for non-i2c entries, if you think that is better? It caters for Zero, but relaxes the condition for non-Zero code. So it is like choosing where you want to open the assert gap: for Zero, or for everything else? Oh no, it is even worse: some (all?) Zero adapters have no instructions at all, so not only we fail one side of this assert, but also the other side, since `cb->insts()->size()` is `0`. This is what is awkward about Zero code: the VM expects entry points to be _in the stub_, which implies _something_ was generated in the stub. But Zero skips any code generation, and just YOLO-es the external address as "entry point". So offset computations stop making any sense at all! In other words, Zero tries to do some diagnostics, but it does so in a way that is not compatible with the rest of VM, at least on assert side. So I think it is fine for Zero to stop pretending there are valid entry points for c2i entries at least. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2273462250 From shade at openjdk.org Wed Aug 13 13:43:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 13:43:14 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 13:23:19 GMT, Aleksey Shipilev wrote: >> The alternative is to modify the asserts to accept zero offsets for non-i2c entries, if you think that is better? It caters for Zero, but relaxes the condition for non-Zero code. So it is like choosing where you want to open the assert gap: for Zero, or for everything else? > > Oh no, it is even worse: some (all?) Zero adapters have no instructions at all, so not only we fail one side of this assert, but also the other side, since `cb->insts()->size()` is `0`. This is what is awkward about Zero code: the VM expects entry points to be _in the stub_, which implies _something_ was generated in the stub. But Zero skips any code generation, and just YOLO-es the external address as "entry point". So offset computations stop making any sense at all! In other words, Zero tries to do some diagnostics, but it does so in a way that is not compatible with the rest of VM, at least on assert side. > > So I think it is fine for Zero to stop pretending there are valid entry points for c2i entries at least. If you are curious, this is where Zero catches fire during build: report_vm_error(...) AdapterBlob::AdapterBlob() AdapterBlob::create() AdapterHandlerLibrary::generate_adapter_code() AdapterHandlerLibrary::create_adapter() AdapterHandlerLibrary::initialize() SharedRuntime::init_adapter_library() init_globals() So we enter that path routinely, without any AOT features enabled... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2273507389 From adinn at openjdk.org Wed Aug 13 13:49:12 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 13 Aug 2025 13:49:12 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 13:40:16 GMT, Aleksey Shipilev wrote: >> Oh no, it is even worse: some (all?) Zero adapters have no instructions at all, so not only we fail one side of this assert, but also the other side, since `cb->insts()->size()` is `0`. This is what is awkward about Zero code: the VM expects entry points to be _in the stub_, which implies _something_ was generated in the stub. But Zero skips any code generation, and just YOLO-es the external address as "entry point". So offset computations stop making any sense at all! In other words, Zero tries to do some diagnostics, but it does so in a way that is not compatible with the rest of VM, at least on assert side. >> >> So I think it is fine for Zero to stop pretending there are valid entry points for c2i entries at least. > > If you are curious, this is where Zero catches fire during build: > > > report_vm_error(...) > AdapterBlob::AdapterBlob() > AdapterBlob::create() > AdapterHandlerLibrary::generate_adapter_code() > AdapterHandlerLibrary::create_adapter() > AdapterHandlerLibrary::initialize() > SharedRuntime::init_adapter_library() > init_globals() > > > So we enter that path routinely, without any AOT features enabled... Ah, yikes, so we are actually creating an AdapterBlob here? Oh dear, it seems we go on to do so at line 2849 in sharedRuntime.cpp. int entry_offset[AdapterBlob::ENTRY_COUNT]; assert(AdapterBlob::ENTRY_COUNT == 4, "sanity"); address i2c_entry = handler->get_i2c_entry(); entry_offset[0] = 0; // i2c_entry offset entry_offset[1] = handler->get_c2i_entry() - i2c_entry; entry_offset[2] = handler->get_c2i_unverified_entry() - i2c_entry; entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; adapter_blob = AdapterBlob::create(&buffer, entry_offset); <== oops! Ok, so we need to avoid doing that! A fortiori, on Zero, because the result is broken even modulo the problme this patch is trying to fix. A few lines later we do this: handler->relocate(adapter_blob->content_begin()); What that does is compute a delta from the handler's first entry to the blob's start address and then apply that delta to all four addresses. So, that's not going to work. We need to fix this in two places. 1. we should bypass the blob create if we get back a buffer with length 0 i.e. if no code was generated -- that will fix Zero 2. We should tweak AdapterHandlerEntry::relocate() so that it only applies the delta when the corresponding entry address != nullptr -- that will fix arm32 i.e. will ensure that any attempt to use the invalid entry will be using a 0 address. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2273519245 From dfenacci at openjdk.org Wed Aug 13 14:17:23 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 13 Aug 2025 14:17:23 GMT Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove Message-ID: # Issue While compiling `java.util.zip.ZipFile` in C2 this assert is triggered https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235 # Cause While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens: * we insert a trailing `MemBarStoreStore` in the constructor before_folding * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. after_folding * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302 * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235 triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped). # Fix Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation. # Testing Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after. Tier 1-3+ tests passed. ------------- Commit messages: - JDK-8360031: update assert message - Merge branch 'master' into JDK-8360031 - JDK-8360031: remove unnecessary include - JDK-8360031: remove UseNewCode - JDK-8360031: compilation asserts in MemBarNode::remove Changes: https://git.openjdk.org/jdk/pull/26556/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26556&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360031 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26556.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26556/head:pull/26556 PR: https://git.openjdk.org/jdk/pull/26556 From dfenacci at openjdk.org Wed Aug 13 14:17:23 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 13 Aug 2025 14:17:23 GMT Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 15:08:29 GMT, Damon Fenacci wrote: > # Issue > While compiling `java.util.zip.ZipFile` in C2 this assert is triggered > https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235 > > # Cause > While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens: > * we insert a trailing `MemBarStoreStore` in the constructor > before_folding > > * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. > after_folding > > * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302 > * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235 > triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier > > The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped). > > # Fix > Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation. > > # Testing > Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after. > Tier 1-3+ tests passed. @shipilev you might want to have a look. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26556#issuecomment-3184108691 From adinn at openjdk.org Wed Aug 13 14:20:12 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 13 Aug 2025 14:20:12 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 13:43:52 GMT, Andrew Dinn wrote: >> If you are curious, this is where Zero catches fire during build: >> >> >> report_vm_error(...) >> AdapterBlob::AdapterBlob() >> AdapterBlob::create() >> AdapterHandlerLibrary::generate_adapter_code() >> AdapterHandlerLibrary::create_adapter() >> AdapterHandlerLibrary::initialize() >> SharedRuntime::init_adapter_library() >> init_globals() >> >> >> So we enter that path routinely, without any AOT features enabled... > > Ah, yikes, so we are actually creating an AdapterBlob here? > > Oh dear, it seems we go on to do so at line 2849 in sharedRuntime.cpp. > > int entry_offset[AdapterBlob::ENTRY_COUNT]; > assert(AdapterBlob::ENTRY_COUNT == 4, "sanity"); > address i2c_entry = handler->get_i2c_entry(); > entry_offset[0] = 0; // i2c_entry offset > entry_offset[1] = handler->get_c2i_entry() - i2c_entry; > entry_offset[2] = handler->get_c2i_unverified_entry() - i2c_entry; > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > adapter_blob = AdapterBlob::create(&buffer, entry_offset); <== oops! > > Ok, so we need to avoid doing that! > > A fortiori, on Zero, because the result is broken even modulo the problme this patch is trying to fix. A few lines later we do this: > > handler->relocate(adapter_blob->content_begin()); > > What that does is compute a delta from the handler's first entry to the blob's start address and then apply that delta to all four addresses. So, that's not going to work. > > We need to fix this in two places. > > 1. we should bypass the blob create if we get back a buffer with length 0 i.e. if no code was generated -- that will fix Zero > 2. We should tweak AdapterHandlerEntry::relocate() so that it only applies the delta when the corresponding entry address != nullptr -- that will fix arm32 i.e. will ensure that any attempt to use the invalid entry will be using a 0 address. Correction - we just need to avoid creating the blob and checking the offsets when SharedRuntime::generate_i2c2i_adapters returns an empty code buffer. The nullptr checks are already in place in relocate. Aleksey, are you ok to do that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2273621359 From bkilambi at openjdk.org Wed Aug 13 14:32:13 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 13 Aug 2025 14:32:13 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 13:01:57 GMT, Andrew Haley wrote: > > The destination register is an FPR. If we would want to modify this to generate a move to a scratch register instead (something similar to loadConI) then we would have to change the destination register to `iregI` > > This is the part I don't understand. Why would you have to change the destination register to `iregI`? I wouldn't. > > ``` > instruct loadConH(vRegF dst, immH con) %{ > match(Set dst con); > format %{ > "something" > %} > ins_encode %{ > __ movw(rscratch1, $con$$constant); > __ fmovs($dst$$reg, rscratch1); > %} > ``` Thanks. Yes I could do that (as I mentioned earlier in my comment), but I was trying to avoid the extra mov -`fmov`. Just that I wasn't sure if this version would be faster than an `ldr`. But it just occured to me that I could compare the latencies. `ldr` on V1 has a latency of 4 cyc and `mov` + `fmov` is 1 + 2 = 3 cyc. So it probably makes sense to go with two moves. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3184163893 From mhaessig at openjdk.org Wed Aug 13 14:45:02 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 13 Aug 2025 14:45:02 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags Message-ID: This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. Testing: - [ ] Github Actions - [ ] tier1,tier2 plus some internal testing on Oracle supported platforms ------------- Commit messages: - Convert test to cross product scenarios - Add TestFramework::addCrossProductScenarios Changes: https://git.openjdk.org/jdk/pull/26762/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365262 Stats: 146 lines in 4 files changed: 134 ins; 8 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26762.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26762/head:pull/26762 PR: https://git.openjdk.org/jdk/pull/26762 From shade at openjdk.org Wed Aug 13 14:53:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 14:53:10 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 14:17:52 GMT, Andrew Dinn wrote: >> Ah, yikes, so we are actually creating an AdapterBlob here? >> >> Oh dear, it seems we go on to do so at line 2849 in sharedRuntime.cpp. >> >> int entry_offset[AdapterBlob::ENTRY_COUNT]; >> assert(AdapterBlob::ENTRY_COUNT == 4, "sanity"); >> address i2c_entry = handler->get_i2c_entry(); >> entry_offset[0] = 0; // i2c_entry offset >> entry_offset[1] = handler->get_c2i_entry() - i2c_entry; >> entry_offset[2] = handler->get_c2i_unverified_entry() - i2c_entry; >> entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; >> >> adapter_blob = AdapterBlob::create(&buffer, entry_offset); <== oops! >> >> Ok, so we need to avoid doing that! >> >> A fortiori, on Zero, because the result is broken even modulo the problme this patch is trying to fix. A few lines later we do this: >> >> handler->relocate(adapter_blob->content_begin()); >> >> What that does is compute a delta from the handler's first entry to the blob's start address and then apply that delta to all four addresses. So, that's not going to work. >> >> We need to fix this in two places. >> >> 1. we should bypass the blob create if we get back a buffer with length 0 i.e. if no code was generated -- that will fix Zero >> 2. We should tweak AdapterHandlerEntry::relocate() so that it only applies the delta when the corresponding entry address != nullptr -- that will fix arm32 i.e. will ensure that any attempt to use the invalid entry will be using a 0 address. > > Correction - we just need to avoid creating the blob and checking the offsets when SharedRuntime::generate_i2c2i_adapters returns an empty code buffer. The nullptr checks are already in place in relocate. > > Aleksey, are you ok to do that? So you want to `return false` or `return true` from `AdapterHandlerLibrary::generate_adapter_code` when, say, `buffer.insts()->size() == 0` right after `SharedRuntime::generate_i2c2i_adapters`? This does not seem to work, since `SharedRuntime` really wants to see some initial adapters generated, even in Zero case: https://github.com/openjdk/jdk/blob/001aaa1e49f2692061cad44d68c9e81a27ea3b98/src/hotspot/share/runtime/sharedRuntime.cpp#L2605-L2609 I honestly do not want to break some _other_ assumption that VM has by not generating some of the adapters for Zero, even if fake ones. I am running out of time to spend on this, maybe you can play around with Zero yourself? `--with-jvm-variants=zero` would give you a build on x86_64 or AArch64 easily. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2273715973 From epeter at openjdk.org Wed Aug 13 15:06:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 13 Aug 2025 15:06:21 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: <0rNuFLFwXcWfF0-nQQEd9fbIrziHos8PZJ93sDPFObo=.0587492e-267b-4681-8fb8-605cdc20f1c3@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <0rNuFLFwXcWfF0-nQQEd9fbIrziHos8PZJ93sDPFObo=.0587492e-267b-4681-8fb8-605cdc20f1c3@github.com> Message-ID: <72xM8drprc1sgKUY0NqxLtbRvxBQ0TdF_ByDaPWrGWw=.9db9c520-826a-4c64-b918-87a41f805c57@github.com> On Tue, 12 Aug 2025 16:19:10 GMT, Manuel H?ssig wrote: >> I don't think that `make_last` makes any assumptions about `iv_scale1 < iv_scale2`. >> But I could consider moving it earlier anyway. Do you think that is worth it? > > I would do it because the proof states that if `iv_scale2 < iv_scale1` we swap them. It would keep it consistent. Also, you won't have to swap the spans. @mhaessig Does it look ok to you now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2273757094 From adinn at openjdk.org Wed Aug 13 15:44:11 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 13 Aug 2025 15:44:11 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 14:50:04 GMT, Aleksey Shipilev wrote: >> Correction - we just need to avoid creating the blob and checking the offsets when SharedRuntime::generate_i2c2i_adapters returns an empty code buffer. The nullptr checks are already in place in relocate. >> >> Aleksey, are you ok to do that? > > So you want to `return false` or `return true` from `AdapterHandlerLibrary::generate_adapter_code` when, say, `buffer.insts()->size() == 0` right after `SharedRuntime::generate_i2c2i_adapters`? This does not seem to work, since `SharedRuntime` really wants to see some initial adapters generated, even in Zero case: https://github.com/openjdk/jdk/blob/001aaa1e49f2692061cad44d68c9e81a27ea3b98/src/hotspot/share/runtime/sharedRuntime.cpp#L2605-L2609 > > I honestly do not want to break some _other_ assumption that VM has by not generating some of the adapters for Zero, even if fake ones. > > I am running out of time to spend on this, maybe you can play around with Zero yourself? `--with-jvm-variants=zero` would give you a build on x86_64 or AArch64 easily. Yeah, sure I'll take this over. It's perverse that we are creating blobs on Zero that we don't need just to satisfy some asserts. I don't believe there can be any real consequences of not generating them because they contain nothing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2273864587 From mhaessig at openjdk.org Wed Aug 13 15:52:21 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 13 Aug 2025 15:52:21 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v8] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <_xzX54JluvxKjADy6VAq8oY3lkRNsV3bYY35A4cJQpo=.3b345a86-2dea-48c2-99bd-7b63fc79af8e@github.com> Message-ID: On Tue, 12 Aug 2025 17:02:11 GMT, Emanuel Peter wrote: >> And I think there is already some filtering in `canonicalize_raw_summands`: >> >> // Keep summands with non-zero scale. >> if (!scaleI.is_zero() && !scaleL.is_NaN()) { >> _raw_summands.at_put(pos_put++, MemPointerRawSummand(variable, scaleI, scaleL, int_group)); >> } > > Ah, but the real work gets done here, in `MemPointer::make`: > > if (raw_summands.length() <= RAW_SUMMANDS_SIZE && > summands.length() <= SUMMANDS_SIZE && > has_no_NaN_in_con_and_summands(con, summands)) { > return MemPointer(pointer, raw_summands, summands, con, size NOT_PRODUCT(COMMA trace)); > } else { > return MemPointer::make_trivial(pointer, size NOT_PRODUCT(COMMA trace)); > } Makes sense, thank you for the explanation :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2273882298 From mhaessig at openjdk.org Wed Aug 13 15:52:22 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 13 Aug 2025 15:52:22 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: <72xM8drprc1sgKUY0NqxLtbRvxBQ0TdF_ByDaPWrGWw=.9db9c520-826a-4c64-b918-87a41f805c57@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <0rNuFLFwXcWfF0-nQQEd9fbIrziHos8PZJ93sDPFObo=.0587492e-267b-4681-8fb8-605cdc20f1c3@github.com> <72xM8drprc1sgKUY0NqxLtbRvxBQ0TdF_ByDaPWrGWw=.9db9c520-826a-4c64-b918-87a41f805c57@github.com> Message-ID: On Wed, 13 Aug 2025 15:04:00 GMT, Emanuel Peter wrote: >> I would do it because the proof states that if `iv_scale2 < iv_scale1` we swap them. It would keep it consistent. Also, you won't have to swap the spans. > > @mhaessig Does it look ok to you now? I think you forgot to push :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2273876847 From shade at openjdk.org Wed Aug 13 16:01:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 16:01:14 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 15:42:00 GMT, Andrew Dinn wrote: >> So you want to `return false` or `return true` from `AdapterHandlerLibrary::generate_adapter_code` when, say, `buffer.insts()->size() == 0` right after `SharedRuntime::generate_i2c2i_adapters`? This does not seem to work, since `SharedRuntime` really wants to see some initial adapters generated, even in Zero case: https://github.com/openjdk/jdk/blob/001aaa1e49f2692061cad44d68c9e81a27ea3b98/src/hotspot/share/runtime/sharedRuntime.cpp#L2605-L2609 >> >> I honestly do not want to break some _other_ assumption that VM has by not generating some of the adapters for Zero, even if fake ones. >> >> I am running out of time to spend on this, maybe you can play around with Zero yourself? `--with-jvm-variants=zero` would give you a build on x86_64 or AArch64 easily. > > Yeah, sure I'll take this over. It's perverse that we are creating blobs on Zero that we don't need just to satisfy some asserts. I don't believe there can be any real consequences of not generating them because they contain nothing. Yeah, I agree. Give me the patch, I'll mix it into this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2273898623 From shade at openjdk.org Wed Aug 13 16:01:15 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 16:01:15 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: References: Message-ID: <30P8RvSXOZfiuX_cjGBlj6Wv4kJWUt83Gi6dtisu-vs=.0d2470c2-80d7-43ea-8b13-b051112ad1d7@github.com> On Wed, 13 Aug 2025 15:55:32 GMT, Aleksey Shipilev wrote: >> Yeah, sure I'll take this over. It's perverse that we are creating blobs on Zero that we don't need just to satisfy some asserts. I don't believe there can be any real consequences of not generating them because they contain nothing. > > Yeah, I agree. Give me the patch, I'll mix it into this PR. Or maybe we do this: let's push this one, and then deal with Zero and adapters specifically? The current patch unbreaks ARM32 and Zero builds, and we don't need to wait for more advanced patch to appear, IMO. My Linux x86_64 zero fastdebug `make bootcycle-images` has just completed cleanly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2273901345 From iveresov at openjdk.org Wed Aug 13 16:02:32 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 13 Aug 2025 16:02:32 GMT Subject: RFR: 8362530: VM crash with -XX:+PrintTieredEvents when collecting AOT profiling [v2] In-Reply-To: References: Message-ID: > When printing tiered events we take the ttyLock and also now the trainingDataLock. While benign it's best to decouple these. The solution is to gather the output bits in a buffer and then print it. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/compiler/compilationPolicy.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26750/files - new: https://git.openjdk.org/jdk/pull/26750/files/ab06c1bc..0aeea89e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26750&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26750&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26750/head:pull/26750 PR: https://git.openjdk.org/jdk/pull/26750 From adinn at openjdk.org Wed Aug 13 16:19:15 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 13 Aug 2025 16:19:15 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v4] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 11:16:59 GMT, Aleksey Shipilev wrote: >> When recording adapter entries, we record _offsets_, not the actual addresses: >> >> >> entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; >> >> >> Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". >> >> This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. >> >> The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `runtime/cds` still works >> - [x] Linux ARM32 server fastdebug, `java -version` now works >> - [x] Linux x86_64 zero fastdebug, `make bootcycle-images` now works > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Handling Zero crash as well This fixes the builds. Will sort out the zero blob creation separately. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26746#pullrequestreview-3116722068 From adinn at openjdk.org Wed Aug 13 16:19:15 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 13 Aug 2025 16:19:15 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v3] In-Reply-To: <30P8RvSXOZfiuX_cjGBlj6Wv4kJWUt83Gi6dtisu-vs=.0d2470c2-80d7-43ea-8b13-b051112ad1d7@github.com> References: <30P8RvSXOZfiuX_cjGBlj6Wv4kJWUt83Gi6dtisu-vs=.0d2470c2-80d7-43ea-8b13-b051112ad1d7@github.com> Message-ID: <0YbnQU9aUo_0o9ps6Qva2YaaKDw5OQup2b4USTMEkWQ=.b6d34436-bdfa-456d-bae0-5258fe67c032@github.com> On Wed, 13 Aug 2025 15:56:39 GMT, Aleksey Shipilev wrote: >> Yeah, I agree. Give me the patch, I'll mix it into this PR. > > Or maybe we do this: let's push this one, and then deal with Zero and adapters specifically? The current patch unbreaks ARM32 and Zero builds, and we don't need to wait for more advanced patch to appear, IMO. My Linux x86_64 zero fastdebug `make bootcycle-images` has just completed cleanly. Yeah, good idea. I'll file a follow-up to stop Zero generating adapter blobs that we don't actually need. Meanwhile I'll re-tag this one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26746#discussion_r2273951221 From iveresov at openjdk.org Wed Aug 13 16:29:12 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 13 Aug 2025 16:29:12 GMT Subject: RFR: 8362530: VM crash with -XX:+PrintTieredEvents when collecting AOT profiling [v2] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 08:53:36 GMT, Christian Hagedorn wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/compiler/compilationPolicy.cpp >> >> Co-authored-by: Christian Hagedorn > > src/hotspot/share/compiler/compilationPolicy.cpp line 552: > >> 550: print_event_on(&s, type, m, im, bci, level); >> 551: ResourceMark rm; >> 552: ttyLocker tty_lock; > > Do you really need the lock with only one `print()`? I thought it should be safe in that case. Yeah, you're right, I probably don't need it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26750#discussion_r2273975993 From iveresov at openjdk.org Wed Aug 13 16:48:29 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 13 Aug 2025 16:48:29 GMT Subject: RFR: 8362530: VM crash with -XX:+PrintTieredEvents when collecting AOT profiling [v3] In-Reply-To: References: Message-ID: <0II089426D6YVp-sTvTd0D3NJYqq44tTzhEC2pFXoVo=.6b25382c-fab3-41c8-8e2d-c092ed62b0b9@github.com> > When printing tiered events we take the ttyLock and also now the trainingDataLock. While benign it's best to decouple these. The solution is to gather the output bits in a buffer and then print it. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Address Christian's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26750/files - new: https://git.openjdk.org/jdk/pull/26750/files/0aeea89e..2c8957cd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26750&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26750&range=01-02 Stats: 11 lines in 2 files changed: 10 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26750/head:pull/26750 PR: https://git.openjdk.org/jdk/pull/26750 From iveresov at openjdk.org Wed Aug 13 16:51:11 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 13 Aug 2025 16:51:11 GMT Subject: RFR: 8362530: VM crash with -XX:+PrintTieredEvents when collecting AOT profiling [v3] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 08:56:47 GMT, Christian Hagedorn wrote: > Do you also have a regression test for the crash that you could add or add the print flag to some existing test to verify your change? Done. Added a test case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26750#issuecomment-3184681190 From duke at openjdk.org Wed Aug 13 17:11:11 2025 From: duke at openjdk.org (duke) Date: Wed, 13 Aug 2025 17:11:11 GMT Subject: RFR: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX [v2] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 00:54:53 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to address the failure caused by x86 forward jump offset exceeding imm8 displacement when running the HotSpot jtreg test `test/hotspot/jtreg/compiler/c2/TestLWLockingCodeGen.java` using Intel APX (on SDE emulator). >> >> This bug triggers an assertion failure in methodHandles_x86.cpp because the assembler emits a short forward jump (imm8 displacement) whose target is more than 127 bytes away, exceeding the allowed range. This appears to be caused by larger stub code size when APX instruction encoding is enabled. >> >> The fix for this issue is to replace the `jccb` instruction with` jcc` in methodHandles_x86.cpp. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > change jccb to jcc in line 157 @vamsi-parasa Your change (at version ea8643c2986366b4f1c4e06d05399434cdd607a9) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26731#issuecomment-3184756272 From sparasa at openjdk.org Wed Aug 13 17:56:19 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 13 Aug 2025 17:56:19 GMT Subject: Integrated: 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 17:38:28 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to address the failure caused by x86 forward jump offset exceeding imm8 displacement when running the HotSpot jtreg test `test/hotspot/jtreg/compiler/c2/TestLWLockingCodeGen.java` using Intel APX (on SDE emulator). > > This bug triggers an assertion failure in methodHandles_x86.cpp because the assembler emits a short forward jump (imm8 displacement) whose target is more than 127 bytes away, exceeding the allowed range. This appears to be caused by larger stub code size when APX instruction encoding is enabled. > > The fix for this issue is to replace the `jccb` instruction with` jcc` in methodHandles_x86.cpp. This pull request has now been integrated. Changeset: 38a26141 Author: Srinivas Vamsi Parasa Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/38a261415dc29aae01c9b878d94cb302c60a3983 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8365265: x86 short forward jump exceeds 8-bit offset in methodHandles_x86.cpp when using Intel APX Reviewed-by: shade, jbhateja, aph ------------- PR: https://git.openjdk.org/jdk/pull/26731 From shade at openjdk.org Wed Aug 13 17:58:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 17:58:14 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 16:18:49 GMT, Vladimir Kozlov wrote: >> When recording adapter entries, we record _offsets_, not the actual addresses: >> >> >> entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; >> >> >> Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". >> >> This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. >> >> The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `runtime/cds` still works >> - [x] Linux ARM32 server fastdebug, `java -version` now works >> - [x] Linux x86_64 zero fastdebug, `make bootcycle-images` now works > > An other, more complex, solution would be to check `handler->get_c2i_*_entry()` for `nullptr` in `generate_adapter_code()` where we set offsets and set offset to 0. Then we can relax assert to `entry_offset[i] >= 0`. We can also remove `entry_offset[0] == 0` check before loop too. and start loop with `i = 0`. > > But it is more complicated. If you agree with this version, @vnkozlov, I'll integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3184901694 From kvn at openjdk.org Wed Aug 13 18:39:17 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 13 Aug 2025 18:39:17 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v4] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 11:16:59 GMT, Aleksey Shipilev wrote: >> When recording adapter entries, we record _offsets_, not the actual addresses: >> >> >> entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; >> >> >> Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". >> >> This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. >> >> The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `runtime/cds` still works >> - [x] Linux ARM32 server fastdebug, `java -version` now works >> - [x] Linux x86_64 zero fastdebug, `make bootcycle-images` now works > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Handling Zero crash as well Agree. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26746#pullrequestreview-3117240290 From bulasevich at openjdk.org Wed Aug 13 18:42:14 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 13 Aug 2025 18:42:14 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v4] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 11:16:59 GMT, Aleksey Shipilev wrote: >> When recording adapter entries, we record _offsets_, not the actual addresses: >> >> >> entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; >> >> >> Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". >> >> This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. >> >> The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `runtime/cds` still works >> - [x] Linux ARM32 server fastdebug, `java -version` now works >> - [x] Linux x86_64 zero fastdebug, `make bootcycle-images` now works > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Handling Zero crash as well Marked as reviewed by bulasevich (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26746#pullrequestreview-3117248287 From kxu at openjdk.org Wed Aug 13 19:38:12 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 13 Aug 2025 19:38:12 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v2] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 12:22:00 GMT, Francisco Ferrari Bihurriet wrote: >> Update looks good, thanks! I'll run some testing and report back again. >> >>> Could you already find some examples, where this change gives us an improved IR? If so, you could also add it as IR test. >> >> Just double-checking, were you able to find such a test which now improves the IR with the better type info and `CmpU` while we could not with the old code? Otherwise, you could also file a follow-up RFE. > > @chhagedorn >> > Could you already find some examples, where this change gives us an improved IR? If so, you could also add it as IR test. >> >> Just double-checking, were you able to find such a test which now improves the IR with the better type info and `CmpU` while we could not with the old code? Otherwise, you could also file a follow-up RFE. > > Sorry for not replying that, I'm working on it. > > We were explicitly matching the `BoolNode` tests, so let's explore the tests we were previously discarding. > > For **case 1a**, we were explicitly matching `BoolTest::le`, but now `CmpUNode` has `TypeInt::CC_LE` reflecting the fact that `m & x ?u m` is always true, so: > > | Test | Symbolic representation | Result | Improved IR | > |:------------------:|:-----------------------:|:--------:|:-------------------------:| > | `BoolTest::eq` | `m & x =u m` | unknown | no | > | `BoolTest::ne` | `m & x ?u m` | unknown | no | > | **`BoolTest::le`** | **`m & x ?u m`** | **true** | **no (old optimization)** | > | `BoolTest::ge` | `m & x ?u m` | unknown | no | > | `BoolTest::lt` | `m & x | `BoolTest::gt` | `m & x >u m` | false | yes | > > For **case 1b**, we were explicitly matching `BoolTest::lt`, but now `CmpUNode` has `TypeInt::CC_LT` reflecting the fact that `m & x > | Test | Symbolic representation | Result if `m ? -1` | Improved IR | > |:------------------:|:-----------------------:|:------------------:|:-------------------------:| > | `BoolTest::eq` | `m & x =u m + 1` | false | yes | > | `BoolTest::ne` | `m & x ?u m + 1` | true | yes | > | `BoolTest::le` | `m & x ?u m + 1` | true | yes | > | `BoolTest::ge` | `m & x ?u m + 1` | false | yes | > | **`BoolTest::lt`** | **`m & x | `BoolTest::gt` | `m & x >u m + 1` | false | yes | > > I will work on adding IR tests for these cases. > > Regarding real-world use cases, we need to rule out `BoolTest::lt`, as it didn't improve for _case 1a_ and was alread... Thank you @franferrax for catching and addressing the inconsistent state. I neglected that in my original PR. I think it would be beneficial to include [your tables](https://github.com/openjdk/jdk/pull/26666#issuecomment-3183659818) of the two cases in the comments too. Thank you for the hard work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26666#issuecomment-3185308088 From dlong at openjdk.org Wed Aug 13 19:42:31 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 13 Aug 2025 19:42:31 GMT Subject: RFR: 8278874: tighten VerifyStack constraints [v12] In-Reply-To: References: Message-ID: > The VerifyStack logic in Deoptimization::unpack_frames() attempts to check the expression stack size of the interpreter frame against what GenerateOopMap computes. To do this, it needs to know if the state at the current bci represents the "before" state, meaning the bytecode will be reexecuted, or the "after" state, meaning we will advance to the next bytecode. The old code didn't know how to determine exactly what state we were in, so it checked both. This PR cleans that up, so we only have to compute the oopmap once. It also removes old SPARC support. Dean Long has updated the pull request incrementally with one additional commit since the last revision: cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26121/files - new: https://git.openjdk.org/jdk/pull/26121/files/4dab21bd..ffab3f1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26121&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26121&range=10-11 Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26121.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26121/head:pull/26121 PR: https://git.openjdk.org/jdk/pull/26121 From shade at openjdk.org Wed Aug 13 20:52:18 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 20:52:18 GMT Subject: RFR: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 [v4] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 11:16:59 GMT, Aleksey Shipilev wrote: >> When recording adapter entries, we record _offsets_, not the actual addresses: >> >> >> entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; >> >> >> Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". >> >> This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. >> >> The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `runtime/cds` still works >> - [x] Linux ARM32 server fastdebug, `java -version` now works >> - [x] Linux x86_64 zero fastdebug, `make bootcycle-images` now works > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Handling Zero crash as well Thank you, here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26746#issuecomment-3185733909 From shade at openjdk.org Wed Aug 13 20:52:19 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Aug 2025 20:52:19 GMT Subject: Integrated: 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 14:06:35 GMT, Aleksey Shipilev wrote: > When recording adapter entries, we record _offsets_, not the actual addresses: > > > entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > > > Every platform except ARM32 and Zero have all these entries set up, so offset are always sane. But those two platforms set up `nullptr` as `c2i_no_clinit_check_entry()`, because clinit barriers are unimplemented. So the new assert added in [JDK-8364269](https://bugs.openjdk.org/browse/JDK-8364269) fails encountering effectively `nullptr - i2c_entry` "garbage". > > This PR is the second least horrible (IMO) fix for this: relaxing assert by checking that "out of range" values are actually wrapping around back to `0`/`nullptr`. Had to do it in unsigned ints to avoid UB. For the affected platforms, we do not actually access this problematic/garbage entry offset, since we are always checking if clinit barriers are enabled. So the assert is the only place where it matters. > > The least horrible solution would be storing the actual `address`-es instead of `int` offsets. But that likely has footprint implications. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` still works > - [x] Linux ARM32 server fastdebug, `java -version` now works > - [x] Linux x86_64 zero fastdebug, `make bootcycle-images` now works This pull request has now been integrated. Changeset: 9c266ae8 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/9c266ae83c047025d778da41e413701ac3b50b03 Stats: 20 lines in 3 files changed: 11 ins; 1 del; 8 mod 8365229: ARM32: c2i_no_clinit_check_entry assert failed after JDK-8364269 Reviewed-by: kvn, adinn, bulasevich, phh ------------- PR: https://git.openjdk.org/jdk/pull/26746 From dnsimon at openjdk.org Wed Aug 13 21:39:21 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 13 Aug 2025 21:39:21 GMT Subject: RFR: 8365468: EagerJVMCI should only apply to the CompilerBroker JVMCI runtime Message-ID: The primary goal of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447) was to have initialization of the Graal JIT occur in the same phase as the rest of VM startup such that initialization problems are detected and reported prior to executing any user code. This change caused a performance regression for Truffle when it is used in a JDK that includes both jargraal and libgraal. The problem is that Truffle needs jarjvmci but does not need jargraal when libgraal is available. Initializing jargraal in that configuration delays initialization of Truffle (not just Truffle compilation). Additionally, the jargraal instance created will never be used, wasting memory. The solution in this PR is to make EagerJVMCI only apply when initializing jarjvmci on a CompileBroker thread. ------------- Commit messages: - only apply EagerJVMCI on a CompileBroker thread Changes: https://git.openjdk.org/jdk/pull/26768/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26768&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365468 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26768.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26768/head:pull/26768 PR: https://git.openjdk.org/jdk/pull/26768 From dnsimon at openjdk.org Wed Aug 13 21:48:51 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 13 Aug 2025 21:48:51 GMT Subject: RFR: 8365468: EagerJVMCI should only apply to the CompilerBroker JVMCI runtime [v2] In-Reply-To: References: Message-ID: > The primary goal of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447) was to have initialization of the Graal JIT occur in the same phase as the rest of VM startup such that initialization problems are detected and reported prior to executing any user code. > > This change caused a performance regression for Truffle when it is used in a JDK that includes both jargraal and libgraal. The problem is that Truffle needs jarjvmci but does not need jargraal when libgraal is available. Initializing jargraal in that configuration delays initialization of Truffle (not just Truffle compilation). Additionally, the jargraal instance created will never be used, wasting memory. > > The solution in this PR is to make EagerJVMCI only apply when initializing jarjvmci on a CompileBroker thread. Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: only apply EagerJVMCI on a CompileBroker thread ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26768/files - new: https://git.openjdk.org/jdk/pull/26768/files/9e858561..a14d8e8f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26768&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26768&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26768.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26768/head:pull/26768 PR: https://git.openjdk.org/jdk/pull/26768 From fferrari at openjdk.org Thu Aug 14 06:08:22 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Thu, 14 Aug 2025 06:08:22 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v3] In-Reply-To: References: Message-ID: > Hi, this pull request is a second take of 1383fec41756322bf2832c55633e46395b937b40, by updating the `CmpUNode` type as either `TypeInt::CC_LE` (case 1a) or `TypeInt::CC_LT` (case 1b) instead of updating the `BoolNode` type as `TypeInt::ONE`. > > With this approach a56cd371a2c497e4323756f8b8a08a0bba059bf2 becomes unnecessary. Additionally, having the right type in `CmpUNode` could potentially enable further optimizations. > > #### Testing > > In order to evaluate the changes, the following testing has been performed: > > * `jdk:tier1` (see [GitHub Actions run](https://github.com/franferrax/jdk/actions/runs/16789994433)) > * [`TestBoolNodeGVN.java`](https://github.com/openjdk/jdk/blob/jdk-26+9/test/hotspot/jtreg/compiler/c2/gvn/TestBoolNodeGVN.java), created for [JDK-8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value](https://bugs.openjdk.org/browse/JDK-8327381) (1383fec41756322bf2832c55633e46395b937b40) > * I also checked it breaks if I remove the `CmpUNode::Value_cmpu_and_mask` call > * Private reproducer for [JDK-8349584: Improve compiler processing](https://bugs.openjdk.org/browse/JDK-8349584) (a56cd371a2c497e4323756f8b8a08a0bba059bf2) > * A local slowdebug run of the `test/hotspot/jtreg/compiler/c2` category on _Fedora Linux x86_64_ > * Same results as with `master` (f95af744b07a9ec87e2507b3d584cbcddc827bbd) Francisco Ferrari Bihurriet has updated the pull request incrementally with three additional commits since the last revision: - Improve the IR test to add the new covered cases I also checked the test is now failing in the master branch (at f95af744b07a9ec87e2507b3d584cbcddc827bbd). - Remove IR test inverted asserts According to my IGV observations, these inversions aren't necessarily effective. Also, I assume it is safe to remove them because if I apply this change to the master branch, the test still passes (tested at f95af744b07a9ec87e2507b3d584cbcddc827bbd). - Add requested comments from the reviews Add a comment with the BoolTest::cc2logical inferences tables, as suggested by @tabjy. Also, add a comment explaining how PhaseCCP::push_cmpu is handling grandparent updates in the case 1b, as agreed with @chhagedorn. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26666/files - new: https://git.openjdk.org/jdk/pull/26666/files/27ed1a31..e6b1cb89 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26666&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26666&range=01-02 Stats: 279 lines in 2 files changed: 261 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/26666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26666/head:pull/26666 PR: https://git.openjdk.org/jdk/pull/26666 From fferrari at openjdk.org Thu Aug 14 06:08:22 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Thu, 14 Aug 2025 06:08:22 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v2] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 12:22:00 GMT, Francisco Ferrari Bihurriet wrote: >> Update looks good, thanks! I'll run some testing and report back again. >> >>> Could you already find some examples, where this change gives us an improved IR? If so, you could also add it as IR test. >> >> Just double-checking, were you able to find such a test which now improves the IR with the better type info and `CmpU` while we could not with the old code? Otherwise, you could also file a follow-up RFE. > > @chhagedorn >> > Could you already find some examples, where this change gives us an improved IR? If so, you could also add it as IR test. >> >> Just double-checking, were you able to find such a test which now improves the IR with the better type info and `CmpU` while we could not with the old code? Otherwise, you could also file a follow-up RFE. > > Sorry for not replying that, I'm working on it. > > We were explicitly matching the `BoolNode` tests, so let's explore the tests we were previously discarding. > > For **case 1a**, we were explicitly matching `BoolTest::le`, but now `CmpUNode` has `TypeInt::CC_LE` reflecting the fact that `m & x ?u m` is always true, so: > > | Test | Symbolic representation | Result | Improved IR | > |:------------------:|:-----------------------:|:--------:|:-------------------------:| > | `BoolTest::eq` | `m & x =u m` | unknown | no | > | `BoolTest::ne` | `m & x ?u m` | unknown | no | > | **`BoolTest::le`** | **`m & x ?u m`** | **true** | **no (old optimization)** | > | `BoolTest::ge` | `m & x ?u m` | unknown | no | > | `BoolTest::lt` | `m & x | `BoolTest::gt` | `m & x >u m` | false | yes | > > For **case 1b**, we were explicitly matching `BoolTest::lt`, but now `CmpUNode` has `TypeInt::CC_LT` reflecting the fact that `m & x > | Test | Symbolic representation | Result if `m ? -1` | Improved IR | > |:------------------:|:-----------------------:|:------------------:|:-------------------------:| > | `BoolTest::eq` | `m & x =u m + 1` | false | yes | > | `BoolTest::ne` | `m & x ?u m + 1` | true | yes | > | `BoolTest::le` | `m & x ?u m + 1` | true | yes | > | `BoolTest::ge` | `m & x ?u m + 1` | false | yes | > | **`BoolTest::lt`** | **`m & x | `BoolTest::gt` | `m & x >u m + 1` | false | yes | > > I will work on adding IR tests for these cases. > > Regarding real-world use cases, we need to rule out `BoolTest::lt`, as it didn't improve for _case 1a_ and was alread... > Thank you @franferrax for catching and addressing the inconsistent state. I neglected that in my original PR. I think it would be beneficial to include [your tables](https://github.com/openjdk/jdk/pull/26666#issuecomment-3183659818) of the two cases in the comments too. Thank you for the hard work. @tabjy thanks for the suggestion, I added the tables in 32a7940b8eefe56c3aa603be65da7b32981d2ab7. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26666#issuecomment-3187074888 From fferrari at openjdk.org Thu Aug 14 06:08:23 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Thu, 14 Aug 2025 06:08:23 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v3] In-Reply-To: References: Message-ID: <2ykz9gFJNGkJEcZbhfA5UnrKZhyBMaqQQvZ5UwOQX8E=.724abc9d-748d-47c1-870d-1e94a9c44096@github.com> On Wed, 13 Aug 2025 07:01:13 GMT, Christian Hagedorn wrote: >> Hmm, I was oversimplifying the problem, my way of thinking it was the following one: >> >> >> m x m 1 >> \ / \ / >> AndI AddI grandparents >> \ / >> CmpU parent >> | >> Bool grandchild >> >> >> _"As we were updating a grandchild based on its grandparents, we needed an ad-hoc worklist push for the grandchild. Since we now update the type of `CmpU` based on its parents, the canonical parent-to-child propagations should work, and we don't need any ad-hoc grandparents-to-grandchild worklist push anymore."_ >> >> But as you noted, non-immediate `CmpU` inputs such as `m` or `1` can change and should affect the `CmpU` type. Luckily, this already was the case for previous `CmpU` optimizations. >> >> --- >> >> For case **1a**, we don't need `PhaseCCP::push_cmpu` because `m` is also an immediate input of `CmpU`. >> >> >> m x >> \ / >> AndI m >> \ / >> CmpU >> | >> Bool >> >> >> --- >> >> I'm now realizing this was a very lucky situation. The `AndI` input isn't problematic even when `PhaseCCP::push_cmpu` doesn't handle the `use_op == Op_AndI` case, because: >> >> * `x` does not affect the application of `Value_cmpu_and_mask()` >> * In case **1a**, `m` is a direct input of `CmpU` >> * In case **1b**, the `AddI` input is handled in `PhaseCCP::push_cmpu` (`use_op == Op_AddI`) >> >> Please let me know if you think we should add a comment in the code. > > That's a good summary! Thanks for double-checking again. It's indeed only for **1b** a probably that's handled by `push_cmpu()`. It probably would not hurt to add a comment that `push_cmpu` handles this case, just to be sure. Great, I added the comments in 32a7940b8eefe56c3aa603be65da7b32981d2ab7. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26666#discussion_r2275569858 From fferrari at openjdk.org Thu Aug 14 06:10:13 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Thu, 14 Aug 2025 06:10:13 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v2] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 07:05:51 GMT, Christian Hagedorn wrote: >> Francisco Ferrari Bihurriet has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply code review suggestions and add JBS to test > > Update looks good, thanks! I'll run some testing and report back again. > >> Could you already find some examples, where this change gives us an improved IR? If so, you could also add it as IR test. > > Just double-checking, were you able to find such a test which now improves the IR with the better type info and `CmpU` while we could not with the old code? Otherwise, you could also file a follow-up RFE. Hi @chhagedorn, I added the new tests in e6b1cb897d9c75b34744c7d24f72abcec9986b0b. One problem I'm facing is that I'm unable to generate `Bool` nodes with arbitrary `BoolTest` values. Even if I try the assert inversions I removed in 10e1e3f4f796d05dcd5c56bc2365d5d564d93952, C2 has preference for `BoolTest::ne`, `BoolTest::le` and `BoolTest::lt`. Instead of using `BoolTest::eq`, `BoolTest::gt` or `BoolTest::ge`, it swaps what is put in `IfTrue` and `IfFalse`. Even if `javac` generates an `ifeq` and an `ifne` with the same inputs, instead of a single `CmpU` with two `Bool`s (`BoolTest::eq` and `BoolTest::ne`), I get a single `Bool` (`BoolTest::ne`) with two `If` (one of them swapping `IfTrue` with `IfFalse`). I guess this is some sort of canonicalization to enable further optimizations. Do you know a way to influence the `Bool`'s `BoolTest` value? Or @rwestrel do you? This means the following 8 cases are not really testing what they claim, but repeating other cases with `IfTrue` and `IfFalse` swapped: * `testCase1aOptimizeAsFalseForGT[xm|mx]` (they should use `BoolTest::gt`, but use `BoolTest::le`) * `testCase1bOptimizeAsFalseForEQ[xm|mx]` (they should use `BoolTest::eq`, but use `BoolTest::ne`) * `testCase1bOptimizeAsFalseForGE[xm|mx]` (they should use `BoolTest::ge`, but use `BoolTest::lt`) * `testCase1bOptimizeAsFalseForGT[xm|mx]` (they should use `BoolTest::gt`, but use `BoolTest::le`) Even if we don't find a way to influence the `BoolTest`, the cases are still valid and can be kept (just in case the described behaviour changes). ------------- PR Comment: https://git.openjdk.org/jdk/pull/26666#issuecomment-3187081955 From chagedorn at openjdk.org Thu Aug 14 06:15:18 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 14 Aug 2025 06:15:18 GMT Subject: RFR: 8362530: VM crash with -XX:+PrintTieredEvents when collecting AOT profiling [v3] In-Reply-To: <0II089426D6YVp-sTvTd0D3NJYqq44tTzhEC2pFXoVo=.6b25382c-fab3-41c8-8e2d-c092ed62b0b9@github.com> References: <0II089426D6YVp-sTvTd0D3NJYqq44tTzhEC2pFXoVo=.6b25382c-fab3-41c8-8e2d-c092ed62b0b9@github.com> Message-ID: On Wed, 13 Aug 2025 16:48:29 GMT, Igor Veresov wrote: >> When printing tiered events we take the ttyLock and also now the trainingDataLock. While benign it's best to decouple these. The solution is to gather the output bits in a buffer and then print it. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Address Christian's comments Thanks Igor for the update and adding a test, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26750#pullrequestreview-3119154405 From yzheng at openjdk.org Thu Aug 14 07:40:25 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 14 Aug 2025 07:40:25 GMT Subject: RFR: 8365218: [JVMCI] AArch64 CPU features are not computed correctly after 8364128 [v4] In-Reply-To: References: Message-ID: <6ROAi0cU9btXwVfPLA8qrBFsLG_FSDnu8XuxCPR_S1k=.fbc3af8d-dd59-4b2e-8ab9-9f1b4422955d@github.com> On Wed, 13 Aug 2025 07:04:12 GMT, Yudi Zheng wrote: >> https://github.com/openjdk/jdk/pull/26515 changes the `VM_Version::CPU_` constant values on AArch64 and Graal now sees unsupported CPU features. This may result in SIGILL due to Graal emitting unsupported instructions, such as `CPU_SHA3`-based eor3 instructions in AArch64 SHA3 stubs. > > Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge master > - style > - style > - address comments > - [JVMCI] AArch64 CPU features are not computed correctly after 8364128 Thanks for the review! Passed tier1-3, most of tier9, failures seem unrelated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26727#issuecomment-3187300572 From yzheng at openjdk.org Thu Aug 14 07:43:16 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 14 Aug 2025 07:43:16 GMT Subject: Integrated: 8365218: [JVMCI] AArch64 CPU features are not computed correctly after 8364128 In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 13:59:55 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/26515 changes the `VM_Version::CPU_` constant values on AArch64 and Graal now sees unsupported CPU features. This may result in SIGILL due to Graal emitting unsupported instructions, such as `CPU_SHA3`-based eor3 instructions in AArch64 SHA3 stubs. This pull request has now been integrated. Changeset: e3201628 Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/e320162815d529bc65cd058b34ec39d60d032ce7 Stats: 74 lines in 4 files changed: 9 ins; 54 del; 11 mod 8365218: [JVMCI] AArch64 CPU features are not computed correctly after 8364128 Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/26727 From mchevalier at openjdk.org Thu Aug 14 08:02:12 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 14 Aug 2025 08:02:12 GMT Subject: RFR: 8360561: PhaseIdealLoop::create_new_if_for_predicate hits "must be a uct if pattern" assert In-Reply-To: References: Message-ID: On Tue, 5 Aug 2025 08:55:30 GMT, Manuel H?ssig wrote: >> No, only supplying `Xcomp` to the parent process (the one running the `main`) disables IR verification. You can supply whatever flag to the child process and the IR verification still applies. You can see this in all Valhalla tests. > > Good to know. Thank you for clearing that up for me. Indeed. I use it here to prevent profiling from removing an actually impossible path with a trap, because bad things happen in a dead path. It's not the first time I use `Xcomp` for that, and there are other ways (like setting a maximum on the number of traps per method, or disabling the warmup (and so profiling) in IR framework execution). That was discussed in some other PR without strong opinions or consensus on what would be the preferred way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26504#discussion_r2275800369 From mchevalier at openjdk.org Thu Aug 14 08:07:15 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 14 Aug 2025 08:07:15 GMT Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph In-Reply-To: References: Message-ID: <7d3EZ8TMb0MSMJkkjFws3aVHHUf_EoTktuzQsmQThdI=.fddc789a-0ac3-4a74-a15f-0a7d559cb593@github.com> On Tue, 29 Jul 2025 16:59:20 GMT, Vladimir Kozlov wrote: >> Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash. >> >> Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN. >> >> This feature is enabled with the develop flag `VerifyIdealStructuralInvariants`. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as `Node::dump` or `Node::Name`. >> >> For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a `If` node, we have a `IfTrue` and a `IfFalse`. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds. >> >> On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs: >> >> 1 failure for node >> 211 OuterStripMinedLoopEnd === 215 39 [[ 212 198 ]] P=0,948966, C=23799,000000 >> At node >> 209 CountedLoopEnd === 182 208 [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100) >> From path: >> [center] 211 OuterStripMinedLoopEnd === 215 39 [[ 212 198 ]] P=0,948966, C=23799,000000 >> <-(0)- 215 SafePoint === 210 1 7 1 1 216 37 54 185 [[ 211 ]] SafePoint !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100) >> <-(0)- 210 IfFalse === 209 [[ 21... > > I am fine with `VerifyIdealGraph` flag. The main concern is we have tons of `Verify*` flags but I don't think we use them in CI testing. So we are forgetting about them, they will brake and few years later we are removing them like we did with `VerifyOpto`. @vnkozlov: That is true. I think the idea was to use the in tests (typically stress tests) once integrated. Or at least, use it in the issues I found thanks to this flag. That should make it not totally at least. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26362#issuecomment-3187382299 From bkilambi at openjdk.org Thu Aug 14 09:09:32 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 14 Aug 2025 09:09:32 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v6] In-Reply-To: References: Message-ID: > After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - > `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - > > > public void vectorAddConstInputFloat16() { > for (int i = 0; i < LEN; ++i) { > output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); > } > } > > > > > > The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. > > This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). > > Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Modify loadConH to use a mov and fmov instead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26589/files - new: https://git.openjdk.org/jdk/pull/26589/files/f8dc132b..3a12ca00 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=04-05 Stats: 14 lines in 1 file changed: 6 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/26589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26589/head:pull/26589 PR: https://git.openjdk.org/jdk/pull/26589 From shade at openjdk.org Thu Aug 14 09:44:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 14 Aug 2025 09:44:14 GMT Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 15:08:29 GMT, Damon Fenacci wrote: > # Issue > While compiling `java.util.zip.ZipFile` in C2 this assert is triggered > https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235 > > # Cause > While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens: > * we insert a trailing `MemBarStoreStore` in the constructor > before_folding > > * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. > after_folding > > * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302 > * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235 > triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier > > The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped). > > # Fix > Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation. > > # Testing > Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after. > Tier 1-3+ tests passed. This looks reasonable to me. So it looks to be an overly zealous assert rather than compiler bug? Someone more savvy with C2 code need to look and confirm. Oh, maybe pull from the recent master to get GHA fixes, and other fixes? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26556#pullrequestreview-3119874496 PR Comment: https://git.openjdk.org/jdk/pull/26556#issuecomment-3187809417 From epeter at openjdk.org Thu Aug 14 09:52:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 14 Aug 2025 09:52:06 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v9] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. > - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. > - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` > - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` > - For aliasing analysis (adjacency and overlap), the "regu... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: moved swapping up, suggested by Manuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/4a240226..21ea9b2b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=07-08 Stats: 13 lines in 1 file changed: 4 ins; 7 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Thu Aug 14 09:52:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 14 Aug 2025 09:52:07 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <0rNuFLFwXcWfF0-nQQEd9fbIrziHos8PZJ93sDPFObo=.0587492e-267b-4681-8fb8-605cdc20f1c3@github.com> <72xM8drprc1sgKUY0NqxLtbRvxBQ0TdF_ByDaPWrGWw=.9db9c520-826a-4c64-b918-87a41f805c57@github.com> Message-ID: On Wed, 13 Aug 2025 15:46:57 GMT, Manuel H?ssig wrote: >> @mhaessig Does it look ok to you now? > > I think you forgot to push :) @mhaessig I did push, but the push failed ? Now just pushed it successfully ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2276138593 From bkilambi at openjdk.org Thu Aug 14 10:27:14 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 14 Aug 2025 10:27:14 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v6] In-Reply-To: References: Message-ID: On Thu, 14 Aug 2025 09:09:32 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Modify loadConH to use a mov and fmov instead Tested the latest patch on Graviton3 and both the JTREG tests pass - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` The code now generates the following for `loadConH` (regs `s11`, `z10` taken as examples)- mov rscratch1, #imm fmov s11, rscratch1 This loaded value might be used by any scalar iterations following the `fmov`. For the vectorized loop, if the dup is legal - `dup z10.h, #imm (`replicateHF_imm8_gt128b` machnode) ` and for illegal immediates (`replicateHF` machnode) - `dup z10.h, h11 ` @theRealAph could I please ask for another round of review? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3187936230 From mchevalier at openjdk.org Thu Aug 14 10:43:08 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 14 Aug 2025 10:43:08 GMT Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph [v3] In-Reply-To: References: Message-ID: > Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash. > > Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN. > > This feature is enabled with the develop flag `VerifyIdealStructuralInvariants`. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as `Node::dump` or `Node::Name`. > > For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a `If` node, we have a `IfTrue` and a `IfFalse`. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds. > > On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs: > > 1 failure for node > 211 OuterStripMinedLoopEnd === 215 39 [[ 212 198 ]] P=0,948966, C=23799,000000 > At node > 209 CountedLoopEnd === 182 208 [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100) > From path: > [center] 211 OuterStripMinedLoopEnd === 215 39 [[ 212 198 ]] P=0,948966, C=23799,000000 > <-(0)- 215 SafePoint === 210 1 7 1 1 216 37 54 185 [[ 211 ]] SafePoint !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100) > <-(0)- 210 IfFalse === 209 [[ 215 216 ]] #0 !orig=198 !jvms: StringL... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Beno?t's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26362/files - new: https://git.openjdk.org/jdk/pull/26362/files/9117fde8..700310e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26362&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26362&range=01-02 Stats: 21 lines in 2 files changed: 4 ins; 8 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/26362.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26362/head:pull/26362 PR: https://git.openjdk.org/jdk/pull/26362 From dfenacci at openjdk.org Thu Aug 14 10:54:08 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 14 Aug 2025 10:54:08 GMT Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v2] In-Reply-To: References: Message-ID: <5CGrcWjFZ7Zqj_Tm0LO6Tqg9cUA-xxvcaa2J-yWW8BE=.af4dea7c-e39d-491d-b924-c89fa82e757a@github.com> > # Issue > While compiling `java.util.zip.ZipFile` in C2 this assert is triggered > https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235 > > # Cause > While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens: > * we insert a trailing `MemBarStoreStore` in the constructor > before_folding > > * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. > after_folding > > * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302 > * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235 > triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier > > The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped). > > # Fix > Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation. > > # Testing > Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after. > Tier 1-3+ tests passed. Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into JDK-8360031 - JDK-8360031: update assert message - Merge branch 'master' into JDK-8360031 - JDK-8360031: remove unnecessary include - JDK-8360031: remove UseNewCode - JDK-8360031: compilation asserts in MemBarNode::remove ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26556/files - new: https://git.openjdk.org/jdk/pull/26556/files/ac003e6d..f7bc08c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26556&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26556&range=00-01 Stats: 27226 lines in 633 files changed: 14666 ins; 10419 del; 2141 mod Patch: https://git.openjdk.org/jdk/pull/26556.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26556/head:pull/26556 PR: https://git.openjdk.org/jdk/pull/26556 From chagedorn at openjdk.org Thu Aug 14 11:19:14 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 14 Aug 2025 11:19:14 GMT Subject: RFR: 8360561: PhaseIdealLoop::create_new_if_for_predicate hits "must be a uct if pattern" assert In-Reply-To: References: Message-ID: On Thu, 14 Aug 2025 07:59:11 GMT, Marc Chevalier wrote: >> Good to know. Thank you for clearing that up for me. > > Indeed. I use it here to prevent profiling from removing an actually impossible path with a trap, because bad things happen in a dead path. It's not the first time I use `Xcomp` for that, and there are other ways (like setting a maximum on the number of traps per method, or disabling the warmup (and so profiling) in IR framework execution). That was discussed in some other PR without strong opinions or consensus on what would be the preferred way. Ideally you use `@Warmup(0)` without `-Xcomp` + `CompileOnly` to not stress the test VM unnecessarily. But depending on your use case/profiling requirements, it might not be enough, so `-Xcomp` + `CompileOnly` seems like a good option. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26504#discussion_r2276333491 From mhaessig at openjdk.org Thu Aug 14 11:20:21 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 14 Aug 2025 11:20:21 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v9] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Thu, 14 Aug 2025 09:52:06 GMT, Emanuel Peter wrote: >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. >> - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. >> - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` >> - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + Conv... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > moved swapping up, suggested by Manuel Thank you for addressing my feedback! This looks good to me now. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/24278#pullrequestreview-3120183372 From dfenacci at openjdk.org Thu Aug 14 11:40:15 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 14 Aug 2025 11:40:15 GMT Subject: RFR: 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size [v2] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 05:40:14 GMT, Amit Kumar wrote: >> There isn't enough initial cache present which can let the interpreter mode run freely. So before even we reach to the compiler phase and try to bail out, in case there isn't enough space left for the stub compilation, JVM crashes. Idea is to increase the Initial cache size and make it enough to run interpreter mode at least. > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into testfix > - take the platform change out of loop > - fix LGTM if you confirm that on S390 it sometimes finishes with "CodeCache is full...", sometimes not but it never crashes. Thanks @offamitkumar. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/25741#pullrequestreview-3120246743 From mhaessig at openjdk.org Thu Aug 14 12:01:04 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 14 Aug 2025 12:01:04 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v6] In-Reply-To: References: Message-ID: > This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. > > The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. > > Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 on all platforms > - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug > - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from Christian Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26023/files - new: https://git.openjdk.org/jdk/pull/26023/files/8bb5eb7a..3689fc71 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=04-05 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26023/head:pull/26023 PR: https://git.openjdk.org/jdk/pull/26023 From mhaessig at openjdk.org Thu Aug 14 12:01:08 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 14 Aug 2025 12:01:08 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v5] In-Reply-To: <7ZSL2sR91qOFup-zauB0VKCoLYB9dHMn3GGwLmo-gEk=.e790e543-fb41-42cd-add2-1f5f4a141afb@github.com> References: <6gq4iIBw4RIqqPvmAf2MHnKrmYHwOdWdH1fz1bFaCGA=.57906956-460f-4a1d-9e3e-fbf91a7974e2@github.com> <7ZSL2sR91qOFup-zauB0VKCoLYB9dHMn3GGwLmo-gEk=.e790e543-fb41-42cd-add2-1f5f4a141afb@github.com> Message-ID: On Wed, 13 Aug 2025 08:31:11 GMT, Christian Hagedorn wrote: >> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8308094-timeout >> - Rename _timer >> - remove _timeout_armed >> - ASSERT >> - Merge branch 'master' into JDK-8308094-timeout >> - No acquire release semantics >> - Factor Linux specific timeout functionality out of share/ >> - Move timeout disarm above if >> - Merge branch 'master' into JDK-8308094-timeout >> - Fix SIGALRM test >> - ... and 1 more: https://git.openjdk.org/jdk/compare/098f25d4...8bb5eb7a > > src/hotspot/os/linux/compilerThreadTimeout_linux.hpp line 46: > >> 44: #endif // !PRODUCT >> 45: public: >> 46: CompilerThreadTimeoutLinux() NOT_PRODUCT(DEBUG_ONLY(: _timer(nullptr))) {}; > > Why do you need the `NOT_PRODUCT`? It only wraps `DEBUG_ONLY`. If that's not set, the `NOT_PRODUCT` wraps nothing. The initialization list should only be generated if `!PRODUCT && ASSERT`, so it does not appear in `optimized`. This is one way of expressing this conjunction in macros. > src/hotspot/share/compiler/compilerThread.hpp line 54: > >> 52: void disarm() { return; }; >> 53: bool init_timeout() { return true; }; >> 54: }; > > Should we also guard this with `ifndef LINUX` since it's only used for non-Linux? Sounds reasonable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2276419674 PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2276420718 From chagedorn at openjdk.org Thu Aug 14 12:37:25 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 14 Aug 2025 12:37:25 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v9] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Thu, 14 Aug 2025 09:52:06 GMT, Emanuel Peter wrote: >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. >> - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. >> - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` >> - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + Conv... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > moved swapping up, suggested by Manuel src/hotspot/share/opto/c2_globals.hpp line 361: > 359: product(bool, LoopMultiversioningOptimizeSlowLoop, true, DIAGNOSTIC, \ > 360: "When using loop multiversioning, and a speculative runtime" \ > 361: "check is added, resume optimization for the stalled slow_loop") \ Suggestion: "When using loop multiversioning, and a speculative runtime" \ " check is added, resume optimization for the stalled slow_loop") \ src/hotspot/share/opto/predicates.hpp line 51: > 49: * - Loop Parse Predicate: The Parse Predicate added for Loop Predicates. > 50: * - Profiled Loop Parse Predicate: The Parse Predicate added for Profiled Loop Predicates. > 51: * - AutoVectorization Predicate: The Parse Predicate added for AutoVectorization runtime checks. Drive-by comment: Can you also add a small section below under "Runtime Predicate" to summarize what an AutoVectorization runtime check is and in what flavors they come? Then we have everything together for future quick reference. src/hotspot/share/opto/predicates.hpp line 54: > 52: * - Loop Limit Check Parse Predicate: The Parse Predicate added for a Loop Limit Check Predicate. > 53: * - Runtime Predicate: This term is used to refer to a Hoisted Check Predicate (either a Loop Predicate or a Profiled > 54: * Loop Predicate) or a Loop Limit Check Predicate. These predicates will be checked at runtime while You should then also update this text, maybe there is more that needs to be updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2276504360 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2276498769 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2276502729 From galder at openjdk.org Thu Aug 14 12:45:16 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 14 Aug 2025 12:45:16 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 14:38:01 GMT, Manuel H?ssig wrote: > This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus some internal testing on Oracle supported platforms Thanks @mhaessig. Nice API improvement! I'm a bit unsure about the way it's tested though. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 49: > 47: "-XX:TLABRefillWasteFraction=64")); > 48: t1.start(); > 49: Asserts.fail("Should have thrown exception"); Hmmm, why do the tests fail? I'm wondering if a simpler way to test the functionality is possible that doesn't require having to figure out failure modes? Maybe some kind of positive test that counts number of test scenarios run? ------------- Changes requested by galder (Author). PR Review: https://git.openjdk.org/jdk/pull/26762#pullrequestreview-3120445726 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2276521299 From mhaessig at openjdk.org Thu Aug 14 13:24:23 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 14 Aug 2025 13:24:23 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags In-Reply-To: References: Message-ID: <6ryfwQT1qJheMxZdAhtyV3sjWaVeM663RyMB1wNCmck=.66f9bead-d58f-4d26-ba73-484e7d9cecc8@github.com> On Thu, 14 Aug 2025 12:41:46 GMT, Galder Zamarre?o wrote: >> This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. >> >> Testing: >> - [x] Github Actions >> - [x] tier1,tier2 plus some internal testing on Oracle supported platforms > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 49: > >> 47: "-XX:TLABRefillWasteFraction=64")); >> 48: t1.start(); >> 49: Asserts.fail("Should have thrown exception"); > > Hmmm, why do the tests fail? I'm wondering if a simpler way to test the functionality is possible that doesn't require having to figure out failure modes? Maybe some kind of positive test that counts number of test scenarios run? Except in the first run, all scenarios fail. That is the only way we currently have to count the scenarios we are executing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2276622808 From fyang at openjdk.org Thu Aug 14 13:26:29 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 14 Aug 2025 13:26:29 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v25] In-Reply-To: <-GNxf920ytSK-hakIM-KWRJ_N1yRHSaC-5oEoYTdPJg=.f7ec1a4a-f8ff-404f-a25b-77d996f4f20d@github.com> References: <-GNxf920ytSK-hakIM-KWRJ_N1yRHSaC-5oEoYTdPJg=.f7ec1a4a-f8ff-404f-a25b-77d996f4f20d@github.com> Message-ID: On Wed, 13 Aug 2025 12:02:58 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > - addressed reviewer's comments/suggestions. Thanks for the quick update. Some minor comments remain. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1920: > 1918: BasicType eltype) > 1919: { > 1920: assert(!UseRVV, "sanity"); Although not dierectly related, can you fix indentation issue of switch-case in this function, `C2_MacroAssembler::arrays_hashcode_elsize` and `C2_MacroAssembler::arrays_hashcode_elload`? We need add two spaces on the left of each case. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1998: > 1996: { > 1997: assert(UseRVV, "sanity"); > 1998: assert(MaxVectorSize >= 16, "sanity"); `MaxVectorSize >= 16` condition has already been ensured on JVM startup in `VM_Version::c2_initialize()`. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2005: > 2003: // size when check UseRVV (i.e. MaxVectorSize == VM_Version::_initial_vector_length). > 2004: // Let's use T_INT as all hashCode calculations eventually deal with ints. > 2005: const int ints_in_vec_reg = MaxVectorSize/sizeof(jint); Please leave a space around the `/` operator. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2010: > 2008: const int elsize_bytes = arrays_hashcode_elsize(eltype); > 2009: const int elsize_shift = exact_log2(elsize_bytes); > 2010: const int MAX_VEC_MASK = ~(ints_in_vec_reg*lmul - 1); Please leave a space around the `*` operator. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2024: > 2022: const Register pow31_highest = tmp1; > 2023: const Register ary_end = tmp2; > 2024: const Register consumed = tmp3; Suggestion: const Register pow31_highest = tmp1; const Register ary_end = tmp2; const Register consumed = tmp3; src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2056: > 2054: shadd(ary, consumed, ary, t0, elsize_shift); > 2055: subw(cnt, cnt, consumed); > 2056: andi(t1, cnt, MAX_VEC_MASK); Can you move `subw + andi` to immediately before `bnez`? src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2111: > 2109: Register src, > 2110: BasicType eltype) { > 2111: assert((T_INT == eltype) || (vdst != vtmp), "should be"); Or simply: `assert_different_registers(vdst, vtmp).` ? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6586: > 6584: address generate_arrays_hashcode_powers_of_31() { > 6585: assert(UseRVV, "sanity"); > 6586: const int ints_in_vec_reg = MaxVectorSize/sizeof(jint); Please leave a space around the `/` operator. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6591: > 6589: StubCodeMark mark(this, "StubRoutines", "arrays_hashcode_powers_of_31"); > 6590: address start = __ pc(); > 6591: for (int i = ints_in_vec_reg*lmul; i >= 0; i--) { Please leave a space around the `*` operator. ------------- PR Review: https://git.openjdk.org/jdk/pull/17413#pullrequestreview-3115969220 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2275637206 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2275645194 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2276613026 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2276615983 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2276622884 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2276591563 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2273430390 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2276613462 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2276614023 From mablakatov at openjdk.org Thu Aug 14 14:01:13 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Thu, 14 Aug 2025 14:01:13 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v11] In-Reply-To: References: Message-ID: > Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. > > Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. > > The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms > IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms > LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms > > > Fujitsu A64FX (SVE 512-bit): > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms > IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms > LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: cleanup: start the SVE Integer Misc - Unpredicated section ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23181/files - new: https://git.openjdk.org/jdk/pull/23181/files/91cbacc0..4aed1f65 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23181&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23181&range=09-10 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23181/head:pull/23181 PR: https://git.openjdk.org/jdk/pull/23181 From mablakatov at openjdk.org Thu Aug 14 14:01:16 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Thu, 14 Aug 2025 14:01:16 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v8] In-Reply-To: <3bziwZ7rfKLirGwnVKQrl-j6-ENu5tktVmcXwZSxmSM=.a7b1b655-7c34-4e4e-b8a7-01db60ead3ad@github.com> References: <072sgUJQa-oI9-uylhiPMzk2wLEr9e_8MZE1joM3fxs=.c0b4df04-57cb-43a4-b42b-340102013524@github.com> <3bziwZ7rfKLirGwnVKQrl-j6-ENu5tktVmcXwZSxmSM=.a7b1b655-7c34-4e4e-b8a7-01db60ead3ad@github.com> Message-ID: <7di_2aN_3kdJy22MpEnzo6J4l0a_JWTzUD8NO4QPp4c=.6a1167c7-aa34-4bdf-9c9f-14a337a2fe7f@github.com> On Wed, 13 Aug 2025 12:18:54 GMT, Andrew Haley wrote: >>> Please try to organize things the same way as the Decode section of the ARM. >> >> Do you refer to *C4: A64 Instruction Set Encoding*? >> >>> Insert a new section called SVE Integer Misc - Unpredicated after SVE bitwise shift by immediate (predicated) and put this pattern there. >> >> I assume you might have misinterpreted **predicated** SVE bitwise shift for **unpredicated**. >> >> In the *C4: A64 Instruction Set Encoding*, *C4.1.41 SVE Integer Misc - Unpredicated* follows *C4.1.40 SVE Bitwise Shift - Unpredicated* which is not implemented by `src/hotspot/cpu/aarch64/assembler_aarch64.hpp` as far as I can tell. Suggested *SVE bitwise shift by immediate (predicated)* falls into *C4.1.34 SVE Bitwise Shift - Predicated*. If this change is to follow the ordering in *C4: A64 Instruction Set Encoding*, the next proceeding implemented instruction class for `sve_movprfx` (from *C4.1.41*) should be [SVE stack frame adjustment](https://github.com/openjdk/jdk/pull/23181/files/4593a5d717024df01769625993c2b769d8dde311#diff-203c5bbfa5307b5cc529c80acf90e764260db018ed658b949421f91190c56982L3686) which falls into *C4.1.38 SVE Stack Allocation*. The next following implemented instruction class should be [SVE element count](https://github.com/openjdk/jdk/pull/23181/files/4593a5d717024df01769625993c2b769d8dde311#diff-203c5bbfa5307b5cc529c80acf90e764260db018ed658b949421f91190c5 6982L4067) (inconveniently named something else in the source file) which falls into *C4.1.42 SVE Element Count*. The two instruction classes doesn't follow each other in the file, unfortunately, so it's one or the other. Currently it's the latter. > >> I assume you might have misinterpreted **predicated** SVE bitwise shift for **unpredicated**. > > It's possible. The point is to make sure that any new instruction is in a section corresponding to its section in hte Decoding tables. Please make your best guess as to where that should be, and we'll discuss it. To (at least partially) conform to the ordering in *C4: A64 Instruction Set Encoding*, it should be placed either right after [SVE stack frame adjustment](https://github.com/openjdk/jdk/pull/23181/files/4593a5d717024df01769625993c2b769d8dde311#diff-203c5bbfa5307b5cc529c80acf90e764260db018ed658b949421f91190c56982L3686) or right before [SVE element count](https://github.com/openjdk/jdk/pull/23181/files/4593a5d717024df01769625993c2b769d8dde311#diff-203c5bbfa5307b5cc529c80acf90e764260db018ed658b949421f91190c56982L4067) as described above. The patch does the latter. I've started the section, please check https://github.com/openjdk/jdk/pull/23181/commits/4aed1f65e5c392c18b62d5a79b75dc3ae2cff5f6 and resolve the thread if you find it suitable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2276718110 From mhaessig at openjdk.org Thu Aug 14 14:02:02 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 14 Aug 2025 14:02:02 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v7] In-Reply-To: References: Message-ID: > This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. > > The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. > > Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 on all platforms > - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug > - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) Manuel H?ssig has updated the pull request incrementally with four additional commits since the last revision: - Add test - Remove superfluous NOT_PRODUCT - Report which compilation timed out - Exclude generic class for Linux ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26023/files - new: https://git.openjdk.org/jdk/pull/26023/files/3689fc71..9a43ef26 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=05-06 Stats: 63 lines in 4 files changed: 54 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26023/head:pull/26023 PR: https://git.openjdk.org/jdk/pull/26023 From mhaessig at openjdk.org Thu Aug 14 14:02:04 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 14 Aug 2025 14:02:04 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v5] In-Reply-To: <7ZSL2sR91qOFup-zauB0VKCoLYB9dHMn3GGwLmo-gEk=.e790e543-fb41-42cd-add2-1f5f4a141afb@github.com> References: <6gq4iIBw4RIqqPvmAf2MHnKrmYHwOdWdH1fz1bFaCGA=.57906956-460f-4a1d-9e3e-fbf91a7974e2@github.com> <7ZSL2sR91qOFup-zauB0VKCoLYB9dHMn3GGwLmo-gEk=.e790e543-fb41-42cd-add2-1f5f4a141afb@github.com> Message-ID: On Wed, 13 Aug 2025 08:40:25 GMT, Christian Hagedorn wrote: >> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8308094-timeout >> - Rename _timer >> - remove _timeout_armed >> - ASSERT >> - Merge branch 'master' into JDK-8308094-timeout >> - No acquire release semantics >> - Factor Linux specific timeout functionality out of share/ >> - Move timeout disarm above if >> - Merge branch 'master' into JDK-8308094-timeout >> - Fix SIGALRM test >> - ... and 1 more: https://git.openjdk.org/jdk/compare/b93dcf2a...8bb5eb7a > > Nice improvement! I left some small comments in the code but otherwise the change looks reasonable! > > Can we also add some tests for the new `CompileTaskTimeout` flag? Maybe we can add a positive test and negative test: > - Positive test: Could just be a hello world test with a reasonably large non-zero value for `CompileTaskTimeout`. > - Negative test: Maybe we can just set `CompileTaskTimeout=1` which will probably crash immediately for a hello world program. That could be run in a separate VM and then we can check the output. If we are able to also dump the compile task/method that is timing out, we might even be able to match on that when run with `CompileOnly` for a single method. But not sure if the latter is possible. > > What do you think? Thank you for looking at this, @chhagedorn. I added a simple test and addressed the rest of your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26023#issuecomment-3188563311 From epeter at openjdk.org Thu Aug 14 14:10:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 14 Aug 2025 14:10:55 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v10] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. > - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. > - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` > - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` > - For aliasing analysis (adjacency and overlap), the "regu... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/c2_globals.hpp Co-authored-by: Christian Hagedorn - improve predicates.hpp documentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/21ea9b2b..e6e790eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=08-09 Stats: 23 lines in 2 files changed: 17 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Thu Aug 14 14:10:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 14 Aug 2025 14:10:55 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v9] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <8kHWaARkJKSiWXhMWvvFl9JKB_OTFu4ObwKAsEHompI=.a16f83f5-bff1-486f-a95b-8577fd485a92@github.com> On Thu, 14 Aug 2025 12:31:31 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> moved swapping up, suggested by Manuel > > src/hotspot/share/opto/predicates.hpp line 51: > >> 49: * - Loop Parse Predicate: The Parse Predicate added for Loop Predicates. >> 50: * - Profiled Loop Parse Predicate: The Parse Predicate added for Profiled Loop Predicates. >> 51: * - AutoVectorization Predicate: The Parse Predicate added for AutoVectorization runtime checks. > > Drive-by comment: Can you also add a small section below under "Runtime Predicate" to summarize what an AutoVectorization runtime check is and in what flavors they come? Then we have everything together for future quick reference. Good idea, I updated it! Also added some more for the `Short Running Loop Parse Predicate`. > src/hotspot/share/opto/predicates.hpp line 54: > >> 52: * - Loop Limit Check Parse Predicate: The Parse Predicate added for a Loop Limit Check Predicate. >> 53: * - Runtime Predicate: This term is used to refer to a Hoisted Check Predicate (either a Loop Predicate or a Profiled >> 54: * Loop Predicate) or a Loop Limit Check Predicate. These predicates will be checked at runtime while > > You should then also update this text, maybe there is more that needs to be updated. Good idea, I updated it! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2276744341 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2276744401 From epeter at openjdk.org Thu Aug 14 14:35:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 14 Aug 2025 14:35:53 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. > - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. > - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` > - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` > - For aliasing analysis (adjacency and overlap), the "regu... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more documentation for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/e6e790eb..0180dd27 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=09-10 Stats: 16 lines in 1 file changed: 14 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Thu Aug 14 14:35:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 14 Aug 2025 14:35:53 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v10] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Thu, 14 Aug 2025 14:10:55 GMT, Emanuel Peter wrote: >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. >> - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. >> - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` >> - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + Conv... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/c2_globals.hpp > > Co-authored-by: Christian Hagedorn > - improve predicates.hpp documentation @chhagedorn Thanks for the drive-by comments about the Predicate documentation. Are you now satisfied? Maybe @rwestrel should have a look at it too, since I completed the missing documentation from the Short-Running-Long-Loop-Predicates as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3188677355 From mhaessig at openjdk.org Thu Aug 14 14:57:36 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 14 Aug 2025 14:57:36 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v8] In-Reply-To: References: Message-ID: > This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. > > The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. > > Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 on all platforms > - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug > - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Fix format string ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26023/files - new: https://git.openjdk.org/jdk/pull/26023/files/9a43ef26..ee64b092 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26023/head:pull/26023 PR: https://git.openjdk.org/jdk/pull/26023 From mhaessig at openjdk.org Thu Aug 14 15:08:17 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 14 Aug 2025 15:08:17 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v5] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 17:17:23 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request incrementally with eight additional commits since the last revision: >> >> - Better documentation of profitable() >> - Remove vector sizes >> - Specify vector sizes >> - Merge branch 'jdk-8356176-byte-size' of github.com:mhaessig/jdk into jdk-8356176-byte-size >> - Add asserts >> - Make region a field >> - Even more better debug print >> - Remove redundant scenarios > > Ok, things are improving nicely ? Thank you for looking at this @eme64! I addressed all of your comments. I also reran testing and it passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26429#issuecomment-3188793269 From qamai at openjdk.org Thu Aug 14 15:19:26 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 14 Aug 2025 15:19:26 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v6] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 07:02:58 GMT, Manuel H?ssig wrote: >> A loop of the form >> >> MemorySegment ms = {}; >> for (long i = 0; i < ms.byteSize() / 8L; i++) { >> // vectorizable work >> } >> >> does not vectorize, whereas >> >> MemorySegment ms = {}; >> long size = ms.byteSize(); >> for (long i = 0; i < size / 8L; i++) { >> // vectorizable work >> } >> >> vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: >> >> https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 >> >> Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. >> >> So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. >> >> @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem >> >> ## Change Description >> >> Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. >> >>
Explored Alternatives >> 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. >> 2. I... > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge branch 'master' into jdk-8356176-byte-size > - Emanuel's suggestion > - Better documentation of profitable() > - Remove vector sizes > - Specify vector sizes > - Merge branch 'jdk-8356176-byte-size' of github.com:mhaessig/jdk into jdk-8356176-byte-size > - Update field documentation > > Co-authored-by: Emanuel Peter > - Add asserts > - Make region a field > - Even more better debug print > - ... and 13 more: https://git.openjdk.org/jdk/compare/25480f00...025dbe6e Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26429#pullrequestreview-3120998268 From never at openjdk.org Thu Aug 14 15:35:11 2025 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 14 Aug 2025 15:35:11 GMT Subject: RFR: 8365468: EagerJVMCI should only apply to the CompilerBroker JVMCI runtime [v2] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 21:48:51 GMT, Doug Simon wrote: >> The primary goal of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447) was to have initialization of the Graal JIT occur in the same phase as the rest of VM startup such that initialization problems are detected and reported prior to executing any user code. >> >> This change caused a performance regression for Truffle when it is used in a JDK that includes both jargraal and libgraal. The problem is that Truffle needs jarjvmci but does not need jargraal when libgraal is available. Initializing jargraal in that configuration delays initialization of Truffle (not just Truffle compilation). Additionally, the jargraal instance created will never be used, wasting memory. >> >> The solution in this PR is to make EagerJVMCI only apply when initializing JVMCI on a CompileBroker thread. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > only apply EagerJVMCI on a CompileBroker thread Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26768#pullrequestreview-3121062237 From dfenacci at openjdk.org Thu Aug 14 15:41:25 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 14 Aug 2025 15:41:25 GMT Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee Message-ID: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> # Issue The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one. In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan. # Cause The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading. For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method. What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`. More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same. # Fix This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. So, we change the assert to **check for invalid dependencies if the current callee and the previous one don't match**. # Testing This issue is very very, very intermittent and depending on a number of factors. This makes the creation of a simple JTREG test incredibly difficult. Therefore, instead of creating one, we can rely on the failing CTW one and increasing the likelihood of hitting the issue by **adding repeated late inlining attempts to the `StressIncrementalInlining` stress flag**. Tier 1-3+ ------------- Commit messages: - JDK-8355354: rewrite comment and check - JDK-8355354: acquire compile lock - JDK-8355354: change assert to validate dependencies if callees don't match - JDK-8355354: remove failing check - JDK-8355354: remove unneeded changes - JDK-8355354: Changes: https://git.openjdk.org/jdk/pull/26441/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26441&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355354 Stats: 25 lines in 3 files changed: 22 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26441.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26441/head:pull/26441 PR: https://git.openjdk.org/jdk/pull/26441 From iveresov at openjdk.org Thu Aug 14 17:02:19 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 14 Aug 2025 17:02:19 GMT Subject: RFR: 8362530: VM crash with -XX:+PrintTieredEvents when collecting AOT profiling [v3] In-Reply-To: <0II089426D6YVp-sTvTd0D3NJYqq44tTzhEC2pFXoVo=.6b25382c-fab3-41c8-8e2d-c092ed62b0b9@github.com> References: <0II089426D6YVp-sTvTd0D3NJYqq44tTzhEC2pFXoVo=.6b25382c-fab3-41c8-8e2d-c092ed62b0b9@github.com> Message-ID: On Wed, 13 Aug 2025 16:48:29 GMT, Igor Veresov wrote: >> When printing tiered events we take the ttyLock and also now the trainingDataLock. While benign it's best to decouple these. The solution is to gather the output bits in a buffer and then print it. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Address Christian's comments Thanks Vladimir and Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26750#issuecomment-3189187511 From iveresov at openjdk.org Thu Aug 14 17:02:20 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 14 Aug 2025 17:02:20 GMT Subject: Integrated: 8362530: VM crash with -XX:+PrintTieredEvents when collecting AOT profiling In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 18:02:16 GMT, Igor Veresov wrote: > When printing tiered events we take the ttyLock and also now the trainingDataLock. While benign it's best to decouple these. The solution is to gather the output bits in a buffer and then print it. This pull request has now been integrated. Changeset: 26ccb3ce Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/26ccb3cef17a7a2a4b09af1e1e29b96d54a418aa Stats: 77 lines in 3 files changed: 20 ins; 2 del; 55 mod 8362530: VM crash with -XX:+PrintTieredEvents when collecting AOT profiling Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/26750 From kvn at openjdk.org Thu Aug 14 17:36:15 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 14 Aug 2025 17:36:15 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled [v4] In-Reply-To: References: Message-ID: <2s3h8wPCQWNLQCLFPHu6zHsIyAWPHLvfaqmsQJR7D6s=.20af40bf-20af-4520-9074-2439c364c04a@github.com> On Wed, 13 Aug 2025 12:10:54 GMT, Saranya Natarajan wrote: >> **Issue** >> An error, `assert(data->is_ReceiverTypeData()) failed: bad profile data type`, is encountered during C2 compilation due to bad profile data. This occurs when the code is compiled with `TypeProfileCasts` option disabled. >> >> **Analysis** >> The assertion failure occurs in `record_profiled_receiver_for_speculation` that analyzes the profiling information in the method data to determine whether a null value has been observed in the `instanceof` operation. This information is encoded in the `BitData` during profiling. When the method identifies that a null has been seen, it proceeds to inspect the associated `ReceiverTypeData` to see if the type check is always performed against null. However, in this scenario, the incoming profiling data is of type `BitData` rather than `ReceiverTypeData`, leading to the assertion failure. >> >> The profiling information for null seen for operations `aastore`, `instanceof`, and `checkcast` is recorded by the method `profile_null_seen `(in` src/hotspot/cpu/x86/templateTable_x86.cpp `). On investigating this method, it can be observed that the method data pointer is not updated for `VirtualCallData` (which is a subclass of `ReceiverTypeData`) when the `TypeProfileCasts` option is disabled. >> >> **Solution** >> My proposal is to inspect the `ReceiverTypeData` in function `record_profiled_receiver_for_speculation` only if `TypeProfileCasts` is enabled (this is based on the fact that the relevant method data pointer is not updated when `TypeProfileCasts` is disabled). >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? >> >> **Testing** >> GitHub Actions >> tier1 to tier3 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > > Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: > > - formating code > - add CompileThresholdScaling Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26640#pullrequestreview-3121547300 From fferrari at openjdk.org Thu Aug 14 18:10:29 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Thu, 14 Aug 2025 18:10:29 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v4] In-Reply-To: References: Message-ID: > Hi, this pull request is a second take of 1383fec41756322bf2832c55633e46395b937b40, by updating the `CmpUNode` type as either `TypeInt::CC_LE` (case 1a) or `TypeInt::CC_LT` (case 1b) instead of updating the `BoolNode` type as `TypeInt::ONE`. > > With this approach a56cd371a2c497e4323756f8b8a08a0bba059bf2 becomes unnecessary. Additionally, having the right type in `CmpUNode` could potentially enable further optimizations. > > #### Testing > > In order to evaluate the changes, the following testing has been performed: > > * `jdk:tier1` (see [GitHub Actions run](https://github.com/franferrax/jdk/actions/runs/16789994433)) > * [`TestBoolNodeGVN.java`](https://github.com/openjdk/jdk/blob/jdk-26+9/test/hotspot/jtreg/compiler/c2/gvn/TestBoolNodeGVN.java), created for [JDK-8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value](https://bugs.openjdk.org/browse/JDK-8327381) (1383fec41756322bf2832c55633e46395b937b40) > * I also checked it breaks if I remove the `CmpUNode::Value_cmpu_and_mask` call > * Private reproducer for [JDK-8349584: Improve compiler processing](https://bugs.openjdk.org/browse/JDK-8349584) (a56cd371a2c497e4323756f8b8a08a0bba059bf2) > * A local slowdebug run of the `test/hotspot/jtreg/compiler/c2` category on _Fedora Linux x86_64_ > * Same results as with `master` (f95af744b07a9ec87e2507b3d584cbcddc827bbd) Francisco Ferrari Bihurriet has updated the pull request incrementally with one additional commit since the last revision: Make testCorrectness @Run the C2 compiled versions Correctness of the tests with the following name format should be checked in the TestFramework.run() JVM process, with the C2 compiled version of these methods. TestFramework's warmup ensures this. testCase(1a|1b)(OptimizeAsTrue|OptimizeAsFalse)For(EQ|NE|LE|GE|LT|GT)(xm|mx) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26666/files - new: https://git.openjdk.org/jdk/pull/26666/files/e6b1cb89..e2f8a43c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26666&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26666&range=02-03 Stats: 33 lines in 1 file changed: 16 ins; 17 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26666/head:pull/26666 PR: https://git.openjdk.org/jdk/pull/26666 From fferrari at openjdk.org Thu Aug 14 18:10:30 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Thu, 14 Aug 2025 18:10:30 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v3] In-Reply-To: References: Message-ID: On Thu, 14 Aug 2025 06:08:22 GMT, Francisco Ferrari Bihurriet wrote: >> Hi, this pull request is a second take of 1383fec41756322bf2832c55633e46395b937b40, by updating the `CmpUNode` type as either `TypeInt::CC_LE` (case 1a) or `TypeInt::CC_LT` (case 1b) instead of updating the `BoolNode` type as `TypeInt::ONE`. >> >> With this approach a56cd371a2c497e4323756f8b8a08a0bba059bf2 becomes unnecessary. Additionally, having the right type in `CmpUNode` could potentially enable further optimizations. >> >> #### Testing >> >> In order to evaluate the changes, the following testing has been performed: >> >> * `jdk:tier1` (see [GitHub Actions run](https://github.com/franferrax/jdk/actions/runs/16789994433)) >> * [`TestBoolNodeGVN.java`](https://github.com/openjdk/jdk/blob/jdk-26+9/test/hotspot/jtreg/compiler/c2/gvn/TestBoolNodeGVN.java), created for [JDK-8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value](https://bugs.openjdk.org/browse/JDK-8327381) (1383fec41756322bf2832c55633e46395b937b40) >> * I also checked it breaks if I remove the `CmpUNode::Value_cmpu_and_mask` call >> * Private reproducer for [JDK-8349584: Improve compiler processing](https://bugs.openjdk.org/browse/JDK-8349584) (a56cd371a2c497e4323756f8b8a08a0bba059bf2) >> * A local slowdebug run of the `test/hotspot/jtreg/compiler/c2` category on _Fedora Linux x86_64_ >> * Same results as with `master` (f95af744b07a9ec87e2507b3d584cbcddc827bbd) > > Francisco Ferrari Bihurriet has updated the pull request incrementally with three additional commits since the last revision: > > - Improve the IR test to add the new covered cases > > I also checked the test is now failing in the master branch (at > f95af744b07a9ec87e2507b3d584cbcddc827bbd). > - Remove IR test inverted asserts > > According to my IGV observations, these inversions aren't necessarily > effective. Also, I assume it is safe to remove them because if I apply > this change to the master branch, the test still passes (tested at > f95af744b07a9ec87e2507b3d584cbcddc827bbd). > - Add requested comments from the reviews > > Add a comment with the BoolTest::cc2logical inferences tables, as > suggested by @tabjy. > > Also, add a comment explaining how PhaseCCP::push_cmpu is handling > grandparent updates in the case 1b, as agreed with @chhagedorn. Learning a bit more about the IR tests framework, I noticed `testCorrectness` isn't probably doing what we want. It should execute the compiled versions of the following `@Test` methods: testCase(1a|1b)(OptimizeAsTrue|OptimizeAsFalse)For(EQ|NE|LE|GE|LT|GT)(xm|mx) But the `@Test` methods warmup only occurs for `TestFramework.run()` and in a different JVM process. So directly invoking `testCorrectness` outside `TestFramework.run()` is executed in the parent JVM without any warmup. I think I fixed this in e2f8a43ce1ed2861c506af787018c38ed8769fe3 by making `testCorrectness` a `@Run`ner of those `@Test` methods, but please confirm. # Absence note Today is the last day before a ~2 weeks vacation, so my next working day is Monday, September 1st. Please feel free to keep giving feedback and/or reviews, and I will continue when I'm back. Cheers, Francisco ------------- PR Comment: https://git.openjdk.org/jdk/pull/26666#issuecomment-3189413611 PR Comment: https://git.openjdk.org/jdk/pull/26666#issuecomment-3189418153 From qamai at openjdk.org Thu Aug 14 18:26:17 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 14 Aug 2025 18:26:17 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v4] In-Reply-To: References: Message-ID: On Thu, 14 Aug 2025 18:10:29 GMT, Francisco Ferrari Bihurriet wrote: >> Hi, this pull request is a second take of 1383fec41756322bf2832c55633e46395b937b40, by updating the `CmpUNode` type as either `TypeInt::CC_LE` (case 1a) or `TypeInt::CC_LT` (case 1b) instead of updating the `BoolNode` type as `TypeInt::ONE`. >> >> With this approach a56cd371a2c497e4323756f8b8a08a0bba059bf2 becomes unnecessary. Additionally, having the right type in `CmpUNode` could potentially enable further optimizations. >> >> #### Testing >> >> In order to evaluate the changes, the following testing has been performed: >> >> * `jdk:tier1` (see [GitHub Actions run](https://github.com/franferrax/jdk/actions/runs/16789994433)) >> * [`TestBoolNodeGVN.java`](https://github.com/openjdk/jdk/blob/jdk-26+9/test/hotspot/jtreg/compiler/c2/gvn/TestBoolNodeGVN.java), created for [JDK-8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value](https://bugs.openjdk.org/browse/JDK-8327381) (1383fec41756322bf2832c55633e46395b937b40) >> * I also checked it breaks if I remove the `CmpUNode::Value_cmpu_and_mask` call >> * Private reproducer for [JDK-8349584: Improve compiler processing](https://bugs.openjdk.org/browse/JDK-8349584) (a56cd371a2c497e4323756f8b8a08a0bba059bf2) >> * A local slowdebug run of the `test/hotspot/jtreg/compiler/c2` category on _Fedora Linux x86_64_ >> * Same results as with `master` (f95af744b07a9ec87e2507b3d584cbcddc827bbd) > > Francisco Ferrari Bihurriet has updated the pull request incrementally with one additional commit since the last revision: > > Make testCorrectness @Run the C2 compiled versions > > Correctness of the tests with the following name format should be > checked in the TestFramework.run() JVM process, with the C2 compiled > version of these methods. TestFramework's warmup ensures this. > > testCase(1a|1b)(OptimizeAsTrue|OptimizeAsFalse)For(EQ|NE|LE|GE|LT|GT)(xm|mx) src/hotspot/share/opto/subnode.cpp line 902: > 900: const TypeInt* rhs_m_type = phase->type(rhs_m)->isa_int(); > 901: // Exclude any case where m == -1 is possible. > 902: if (rhs_m_type != nullptr && (rhs_m_type->_lo > -1 || rhs_m_type->_hi < -1)) { Please use `!rhs_m_type->contains(-1)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26666#discussion_r2277417174 From fferrari at openjdk.org Thu Aug 14 18:35:53 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Thu, 14 Aug 2025 18:35:53 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v5] In-Reply-To: References: Message-ID: <-r3eHhuzv6NeHNJ0YRpTey8a8YN8q52kXR1fym1RGd0=.1a490d02-260b-493c-b377-d65ad68bca41@github.com> > Hi, this pull request is a second take of 1383fec41756322bf2832c55633e46395b937b40, by updating the `CmpUNode` type as either `TypeInt::CC_LE` (case 1a) or `TypeInt::CC_LT` (case 1b) instead of updating the `BoolNode` type as `TypeInt::ONE`. > > With this approach a56cd371a2c497e4323756f8b8a08a0bba059bf2 becomes unnecessary. Additionally, having the right type in `CmpUNode` could potentially enable further optimizations. > > #### Testing > > In order to evaluate the changes, the following testing has been performed: > > * `jdk:tier1` (see [GitHub Actions run](https://github.com/franferrax/jdk/actions/runs/16789994433)) > * [`TestBoolNodeGVN.java`](https://github.com/openjdk/jdk/blob/jdk-26+9/test/hotspot/jtreg/compiler/c2/gvn/TestBoolNodeGVN.java), created for [JDK-8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value](https://bugs.openjdk.org/browse/JDK-8327381) (1383fec41756322bf2832c55633e46395b937b40) > * I also checked it breaks if I remove the `CmpUNode::Value_cmpu_and_mask` call > * Private reproducer for [JDK-8349584: Improve compiler processing](https://bugs.openjdk.org/browse/JDK-8349584) (a56cd371a2c497e4323756f8b8a08a0bba059bf2) > * A local slowdebug run of the `test/hotspot/jtreg/compiler/c2` category on _Fedora Linux x86_64_ > * Same results as with `master` (f95af744b07a9ec87e2507b3d584cbcddc827bbd) Francisco Ferrari Bihurriet has updated the pull request incrementally with one additional commit since the last revision: Accept @merykitty's suggestion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26666/files - new: https://git.openjdk.org/jdk/pull/26666/files/e2f8a43c..25aa9d7e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26666&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26666&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26666/head:pull/26666 PR: https://git.openjdk.org/jdk/pull/26666 From fferrari at openjdk.org Thu Aug 14 18:35:54 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Thu, 14 Aug 2025 18:35:54 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v4] In-Reply-To: References: Message-ID: On Thu, 14 Aug 2025 18:23:24 GMT, Quan Anh Mai wrote: >> Francisco Ferrari Bihurriet has updated the pull request incrementally with one additional commit since the last revision: >> >> Make testCorrectness @Run the C2 compiled versions >> >> Correctness of the tests with the following name format should be >> checked in the TestFramework.run() JVM process, with the C2 compiled >> version of these methods. TestFramework's warmup ensures this. >> >> testCase(1a|1b)(OptimizeAsTrue|OptimizeAsFalse)For(EQ|NE|LE|GE|LT|GT)(xm|mx) > > src/hotspot/share/opto/subnode.cpp line 902: > >> 900: const TypeInt* rhs_m_type = phase->type(rhs_m)->isa_int(); >> 901: // Exclude any case where m == -1 is possible. >> 902: if (rhs_m_type != nullptr && (rhs_m_type->_lo > -1 || rhs_m_type->_hi < -1)) { > > Please use `!rhs_m_type->contains(-1)` Accepted in 25aa9d7e27a74cf6ed917af2d328f5880fe84de5. Simple smoke-test check: builds and the IR test passes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26666#discussion_r2277433703 From duke at openjdk.org Thu Aug 14 18:39:53 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 14 Aug 2025 18:39:53 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v26] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: -more updates per reviewer's suggestions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/aaf930be..3fd5388c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=24-25 Stats: 27 lines in 2 files changed: 1 ins; 2 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Thu Aug 14 18:39:53 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 14 Aug 2025 18:39:53 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v22] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 06:05:15 GMT, Fei Yang wrote: >> Based on above experiments it looks reasonable to use `m2` grouping. > >> Based on above experiments it looks reasonable to use `m2` grouping. > > Thanks for the extra JMH numbers. Yes, I agree that `m2` is more reasonable here. > That means we won't need to reserve so many vector registers for `instruct varrays_hashcode` in src/hotspot/cpu/riscv/riscv_v.ad. > So can you free the unused vector registers? Will take a more closer look after that. Thanks for your comments, @RealFYang, updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3189492458 From duke at openjdk.org Thu Aug 14 18:39:54 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 14 Aug 2025 18:39:54 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v25] In-Reply-To: References: <-GNxf920ytSK-hakIM-KWRJ_N1yRHSaC-5oEoYTdPJg=.f7ec1a4a-f8ff-404f-a25b-77d996f4f20d@github.com> Message-ID: <0zt_mGPJLxCZ92QYYca5V8J_ng3fi38b1i3oiLdJCJY=.dbb34fcc-eaf6-4e7d-8c35-b78e8822bd56@github.com> On Thu, 14 Aug 2025 06:45:39 GMT, Fei Yang wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> - addressed reviewer's comments/suggestions. > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1920: > >> 1918: BasicType eltype) >> 1919: { >> 1920: assert(!UseRVV, "sanity"); > > Although not dierectly related, can you fix indentation issue of switch-case in this function, `C2_MacroAssembler::arrays_hashcode_elsize` and `C2_MacroAssembler::arrays_hashcode_elload`? > We need add two spaces on the left of each case. Sure. > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2111: > >> 2109: Register src, >> 2110: BasicType eltype) { >> 2111: assert((T_INT == eltype) || (vdst != vtmp), "should be"); > > Or simply: `assert_different_registers(vdst, vtmp).` ? Nice catch, done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2277442246 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2277442681 From aph at openjdk.org Thu Aug 14 20:14:14 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 14 Aug 2025 20:14:14 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v6] In-Reply-To: References: Message-ID: <4LDAfuCcIvT0Q51SGfwW9VP-3iJZCi7LDT6GAtj8b4o=.d0033a4e-3e18-4581-ae7b-84d89ec808e8@github.com> On Thu, 14 Aug 2025 09:09:32 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Modify loadConH to use a mov and fmov instead src/hotspot/cpu/aarch64/aarch64.ad line 7100: > 7098: } else { > 7099: __ movw(rscratch1, imm); > 7100: } Is this a Neoverse-specific optimization? On Apple M1, `mov x0, #0` is handled by renaming (so never issues) but `mov x0, xzr` is not eliminated. Let's go for the simplest here, this is too fussy. Suggestion: __ movw(rscratch1, (uint32_t)$con$$constant); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2277626680 From dlong at openjdk.org Thu Aug 14 20:40:17 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 14 Aug 2025 20:40:17 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v5] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 21:26:22 GMT, Dean Long wrote: >> This PR removes the recently added lock around set_guard_value, using instead Atomic::cmpxchg to atomically update bit-fields of the guard value. Further, it takes a fast-path that uses the previous direct store when at a safepoint. Combined, these changes should get us back to almost where we were before in terms of overhead. If necessary, we could go even further and allow make_not_entrant() to perform a direct byte store, leaving 24 bits for the guard value. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > one unconditional release should be enough I'm not convinced the ZGC regression is real. I see a 1% variance between runs even with the same binary and flags, so it looks like just noise. For this PR I'll stop here. If it turns out that ZGC or Shenandoah do have a small regression because of CAS, we can use direct stores as long as they are done inside a lock, which should be true already for disarm() but would need to be added for make_not_entrant(). @fisk , can I get you to review this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26399#issuecomment-3189805391 PR Comment: https://git.openjdk.org/jdk/pull/26399#issuecomment-3189809429 From dlong at openjdk.org Thu Aug 14 20:44:14 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 14 Aug 2025 20:44:14 GMT Subject: RFR: 8278874: tighten VerifyStack constraints [v12] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 19:42:31 GMT, Dean Long wrote: >> The VerifyStack logic in Deoptimization::unpack_frames() attempts to check the expression stack size of the interpreter frame against what GenerateOopMap computes. To do this, it needs to know if the state at the current bci represents the "before" state, meaning the bytecode will be reexecuted, or the "after" state, meaning we will advance to the next bytecode. The old code didn't know how to determine exactly what state we were in, so it checked both. This PR cleans that up, so we only have to compute the oopmap once. It also removes old SPARC support. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > cleanup Our Graal "tier9" testing passed, so I'm optimistically asking for another round of reviews now, while the longer "tier10" keeps running. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26121#issuecomment-3189817405 From fyang at openjdk.org Fri Aug 15 02:23:17 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 15 Aug 2025 02:23:17 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v26] In-Reply-To: References: Message-ID: On Thu, 14 Aug 2025 18:39:53 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > -more updates per reviewer's suggestions. Seems fine to me modulo some minor coding style issues. Thanks. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2009: > 2007: const int elsize_bytes = arrays_hashcode_elsize(eltype); > 2008: const int elsize_shift = exact_log2(elsize_bytes); > 2009: const int MAX_VEC_MASK = ~(ints_in_vec_reg * lmul - 1); `MAX_VEC_MASK` and `ints_in_vec_reg` looks a bit confusing to me considering vector register grouping. I prefer to rename `ints_in_vec_reg` to `stride`, just like you do for the scalar version. Then we can remove this `MAX_VEC_MASK` and replace it with `~(stride - 1)` where it is used. const int lmul = 2; const int stride = MaxVectorSize / sizeof(jint) * lmul; src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2053: > 2051: vmul_vv(v_src, v_src, v_coeffs); > 2052: vmadd_vx(v_sum, pow31_highest, v_src); > 2053: shadd(ary, consumed, ary, t0, elsize_shift); Can you further move this `shadd` to immediately after `mulw(result, result, pow31_highest);`? I would like to group updating of both `ary` and `cnt` together. src/hotspot/cpu/riscv/riscv_v.ad line 4095: > 4093: TEMP tmp1, TEMP tmp2, TEMP tmp3, KILL cr); > 4094: > 4095: format %{ "Array HashCode array[] $ary,$cnt,$result,$basic_type -> $result\t#varrays_hashcode" %} Suggestion: format %{ "Array HashCode array[] $ary,$cnt,$result,$basic_type -> $result // KILL all" %} src/hotspot/cpu/riscv/stubDeclarations_riscv.hpp line 78: > 76: do_stub(compiler, arrays_hashcode_powers_of_31) \ > 77: do_arch_entry(riscv, compiler, arrays_hashcode_powers_of_31, \ > 78: arrays_hashcode_powers_of_31, arrays_hashcode_powers_of_31) \ Let's keep the trailing `` aligned: do_stub(compiler, arrays_hashcode_powers_of_31) \ do_arch_entry(riscv, compiler, arrays_hashcode_powers_of_31, \ arrays_hashcode_powers_of_31, arrays_hashcode_powers_of_31) \ src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6587: > 6585: assert(UseRVV, "sanity"); > 6586: const int ints_in_vec_reg = MaxVectorSize / sizeof(jint); > 6587: const int lmul = 2; Suggestion: const int lmul = 2; const int stride = MaxVectorSize / sizeof(jint) * lmul; ...... for (int i = stride; i >= 0; i--) { ------------- PR Review: https://git.openjdk.org/jdk/pull/17413#pullrequestreview-3122586012 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2278068359 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2278065225 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2278061270 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2278050015 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2278077761 From mhaessig at openjdk.org Fri Aug 15 06:41:15 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 15 Aug 2025 06:41:15 GMT Subject: RFR: 8278874: tighten VerifyStack constraints [v12] In-Reply-To: References: Message-ID: <4_y_xyPtpT-qA6cHEOkNnMATmZzIDEcSx7omfEIiCZc=.f40168dc-ec95-4eac-8823-a734f8b1ec1a@github.com> On Wed, 13 Aug 2025 19:42:31 GMT, Dean Long wrote: >> The VerifyStack logic in Deoptimization::unpack_frames() attempts to check the expression stack size of the interpreter frame against what GenerateOopMap computes. To do this, it needs to know if the state at the current bci represents the "before" state, meaning the bytecode will be reexecuted, or the "after" state, meaning we will advance to the next bytecode. The old code didn't know how to determine exactly what state we were in, so it checked both. This PR cleans that up, so we only have to compute the oopmap once. It also removes old SPARC support. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > cleanup Marked as reviewed by mhaessig (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26121#pullrequestreview-3123083765 From mhaessig at openjdk.org Fri Aug 15 06:42:14 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 15 Aug 2025 06:42:14 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled [v4] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 12:10:54 GMT, Saranya Natarajan wrote: >> **Issue** >> An error, `assert(data->is_ReceiverTypeData()) failed: bad profile data type`, is encountered during C2 compilation due to bad profile data. This occurs when the code is compiled with `TypeProfileCasts` option disabled. >> >> **Analysis** >> The assertion failure occurs in `record_profiled_receiver_for_speculation` that analyzes the profiling information in the method data to determine whether a null value has been observed in the `instanceof` operation. This information is encoded in the `BitData` during profiling. When the method identifies that a null has been seen, it proceeds to inspect the associated `ReceiverTypeData` to see if the type check is always performed against null. However, in this scenario, the incoming profiling data is of type `BitData` rather than `ReceiverTypeData`, leading to the assertion failure. >> >> The profiling information for null seen for operations `aastore`, `instanceof`, and `checkcast` is recorded by the method `profile_null_seen `(in` src/hotspot/cpu/x86/templateTable_x86.cpp `). On investigating this method, it can be observed that the method data pointer is not updated for `VirtualCallData` (which is a subclass of `ReceiverTypeData`) when the `TypeProfileCasts` option is disabled. >> >> **Solution** >> My proposal is to inspect the `ReceiverTypeData` in function `record_profiled_receiver_for_speculation` only if `TypeProfileCasts` is enabled (this is based on the fact that the relevant method data pointer is not updated when `TypeProfileCasts` is disabled). >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? >> >> **Testing** >> GitHub Actions >> tier1 to tier3 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > > Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: > > - formating code > - add CompileThresholdScaling Marked as reviewed by mhaessig (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26640#pullrequestreview-3123085095 From dnsimon at openjdk.org Fri Aug 15 07:36:18 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 15 Aug 2025 07:36:18 GMT Subject: RFR: 8365468: EagerJVMCI should only apply to the CompilerBroker JVMCI runtime [v2] In-Reply-To: References: Message-ID: <5yXPwTpZXpI2TyZWk2ZWfLzhwMxjEXC8ufeEEX5aNc0=.6e34d5ea-4723-4180-949d-58a7d66bb1e5@github.com> On Wed, 13 Aug 2025 21:48:51 GMT, Doug Simon wrote: >> The primary goal of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447) was to have initialization of the Graal JIT occur in the same phase as the rest of VM startup such that initialization problems are detected and reported prior to executing any user code. >> >> This change caused a performance regression for Truffle when it is used in a JDK that includes both jargraal and libgraal. The problem is that Truffle needs jarjvmci but does not need jargraal when libgraal is available. Initializing jargraal in that configuration delays initialization of Truffle (not just Truffle compilation). Additionally, the jargraal instance created will never be used, wasting memory. >> >> The solution in this PR is to make EagerJVMCI only apply when initializing JVMCI on a CompileBroker thread. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > only apply EagerJVMCI on a CompileBroker thread Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26768#issuecomment-3190847699 From dnsimon at openjdk.org Fri Aug 15 07:39:18 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 15 Aug 2025 07:39:18 GMT Subject: Integrated: 8365468: EagerJVMCI should only apply to the CompilerBroker JVMCI runtime In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 21:18:45 GMT, Doug Simon wrote: > The primary goal of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447) was to have initialization of the Graal JIT occur in the same phase as the rest of VM startup such that initialization problems are detected and reported prior to executing any user code. > > This change caused a performance regression for Truffle when it is used in a JDK that includes both jargraal and libgraal. The problem is that Truffle needs jarjvmci but does not need jargraal when libgraal is available. Initializing jargraal in that configuration delays initialization of Truffle (not just Truffle compilation). Additionally, the jargraal instance created will never be used, wasting memory. > > The solution in this PR is to make EagerJVMCI only apply when initializing JVMCI on a CompileBroker thread. This pull request has now been integrated. Changeset: e3aeebec Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/e3aeebec1798b9adbb02e11f285951d4275c52e8 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod 8365468: EagerJVMCI should only apply to the CompilerBroker JVMCI runtime Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/26768 From chagedorn at openjdk.org Fri Aug 15 08:02:14 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 15 Aug 2025 08:02:14 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v8] In-Reply-To: References: Message-ID: On Thu, 14 Aug 2025 14:57:36 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. >> >> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. >> >> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. >> >> Testing: >> - [x] Github Actions >> - [x] tier1, tier2 on all platforms >> - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug >> - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix format string Thanks a lot for adding a test and printing the method name! A few more comments but otherwise, it looks good now! src/hotspot/os/linux/compilerThreadTimeout_linux.cpp line 44: > 42: CompileTask* task = CompilerThread::current()->task(); > 43: assert(false, "compile task %d (%s) timed out after " INTPTR_FORMAT " ms", > 44: task->compile_id(), task->method()->name_and_sig_as_C_string(), CompileTaskTimeout); Normally, you would probably need a `ResourceMark` for getting the method name. However, I'm not sure if you can do that as well here inside the signal handler. Maybe someone else can comment on that. test/hotspot/jtreg/compiler/arguments/TestCompileTaskTimeout.java line 29: > 27: * @test TestCompileTaskTimeout > 28: * @bug 8308094 > 29: * @requires vm.compiler2.enabled & vm.debug & vm.flagless & os.name == "Linux" Does it only work with C2 compile tasks? test/hotspot/jtreg/compiler/arguments/TestCompileTaskTimeout.java line 40: > 38: > 39: public static void main(String[] args) throws Throwable { > 40: ProcessTools.executeTestJava("-Xcomp", "-XX:CompileTaskTimeout=1", "-version") Nit: I think for newer JDK versions after JDK 8, we should use `--version`. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26023#pullrequestreview-3123183236 PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2278487336 PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2278479526 PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2278490157 From mhaessig at openjdk.org Fri Aug 15 08:30:12 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 15 Aug 2025 08:30:12 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v8] In-Reply-To: References: Message-ID: <6-poNTHw7LVDOcv91ZprJQFTb0nAJbAtxxMwp8vtPTg=.0a80771c-c23d-4f99-ab2e-c6392798d328@github.com> On Fri, 15 Aug 2025 07:55:55 GMT, Christian Hagedorn wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix format string > > src/hotspot/os/linux/compilerThreadTimeout_linux.cpp line 44: > >> 42: CompileTask* task = CompilerThread::current()->task(); >> 43: assert(false, "compile task %d (%s) timed out after " INTPTR_FORMAT " ms", >> 44: task->compile_id(), task->method()->name_and_sig_as_C_string(), CompileTaskTimeout); > > Normally, you would probably need a `ResourceMark` for getting the method name. However, I'm not sure if you can do that as well here inside the signal handler. Maybe someone else can comment on that. Does this really matter when we are doing it right before crashing the VM? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2278528285 From mhaessig at openjdk.org Fri Aug 15 08:41:02 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 15 Aug 2025 08:41:02 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v9] In-Reply-To: References: Message-ID: <9hYCcBeA2l_eP6Jc5wzNYMa1HSHUZJ6xUbU9IeNYvb4=.86647940-df00-4231-b3d5-70c1484bc587@github.com> > This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. > > The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. > > Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 on all platforms > - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug > - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Address Christian's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26023/files - new: https://git.openjdk.org/jdk/pull/26023/files/ee64b092..40bc28ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=07-08 Stats: 10 lines in 1 file changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26023/head:pull/26023 PR: https://git.openjdk.org/jdk/pull/26023 From mhaessig at openjdk.org Fri Aug 15 08:41:02 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 15 Aug 2025 08:41:02 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v8] In-Reply-To: References: Message-ID: On Fri, 15 Aug 2025 07:50:14 GMT, Christian Hagedorn wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix format string > > test/hotspot/jtreg/compiler/arguments/TestCompileTaskTimeout.java line 29: > >> 27: * @test TestCompileTaskTimeout >> 28: * @bug 8308094 >> 29: * @requires vm.compiler2.enabled & vm.debug & vm.flagless & os.name == "Linux" > > Does it only work with C2 compile tasks? Good catch. It is compiler agnostic. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2278539477 From bkilambi at openjdk.org Fri Aug 15 08:51:15 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 15 Aug 2025 08:51:15 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v6] In-Reply-To: <4LDAfuCcIvT0Q51SGfwW9VP-3iJZCi7LDT6GAtj8b4o=.d0033a4e-3e18-4581-ae7b-84d89ec808e8@github.com> References: <4LDAfuCcIvT0Q51SGfwW9VP-3iJZCi7LDT6GAtj8b4o=.d0033a4e-3e18-4581-ae7b-84d89ec808e8@github.com> Message-ID: On Thu, 14 Aug 2025 20:11:12 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Modify loadConH to use a mov and fmov instead > > src/hotspot/cpu/aarch64/aarch64.ad line 7100: > >> 7098: } else { >> 7099: __ movw(rscratch1, imm); >> 7100: } > > Is this a Neoverse-specific optimization? On Apple M1, `mov x0, #0` is handled by renaming (so never issues) but `mov x0, xzr` is not eliminated. Let's go for the simplest here, this is too fussy. > > Suggestion: > > __ movw(rscratch1, (uint32_t)$con$$constant); Thanks for letting me know about the optimization on Apple. After consulting the Software optimization guide, it looks like a `mov w, #0` is also a zero cycle move along with the `movw w, zr` instruction on V1/N2/V2 and is a normal ALU op on N1. Makes sense to eliminate the instruction. I will make the changes you suggested in next PS. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2278556597 From mhaessig at openjdk.org Fri Aug 15 09:42:11 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 15 Aug 2025 09:42:11 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v10] In-Reply-To: References: Message-ID: > This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. > > The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. > > Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 on all platforms > - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug > - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: missed a dash ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26023/files - new: https://git.openjdk.org/jdk/pull/26023/files/40bc28ac..80ddb0ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26023/head:pull/26023 PR: https://git.openjdk.org/jdk/pull/26023 From duke at openjdk.org Fri Aug 15 09:48:13 2025 From: duke at openjdk.org (erifan) Date: Fri, 15 Aug 2025 09:48:13 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v7] In-Reply-To: References: Message-ID: <_mLYFBM0CoUb9fZDL1bPJKArnOWmf_XVz-oN9prPjTQ=.4e8a9f02-205a-4f5f-ab60-920f10452585@github.com> On Wed, 13 Aug 2025 03:20:02 GMT, Jatin Bhateja wrote: >> Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. >> It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. >> >> Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). >> >> Vector API jtreg tests pass at AVX level 2, remaining validation in progress. >> >> Performance numbers: >> >> >> System : 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms >> VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms >> VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms >> VectorSliceB... > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Review comments resolution src/hotspot/share/opto/callGenerator.hpp line 1: > 1: /* 2024 -> 2025 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2278705851 From aph at openjdk.org Fri Aug 15 09:49:13 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 15 Aug 2025 09:49:13 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v6] In-Reply-To: References: <4LDAfuCcIvT0Q51SGfwW9VP-3iJZCi7LDT6GAtj8b4o=.d0033a4e-3e18-4581-ae7b-84d89ec808e8@github.com> Message-ID: On Fri, 15 Aug 2025 08:48:56 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 7100: >> >>> 7098: } else { >>> 7099: __ movw(rscratch1, imm); >>> 7100: } >> >> Is this a Neoverse-specific optimization? On Apple M1, `mov x0, #0` is handled by renaming (so never issues) but `mov x0, xzr` is not eliminated. Let's go for the simplest here, this is too fussy. >> >> Suggestion: >> >> __ movw(rscratch1, (uint32_t)$con$$constant); > > Thanks for letting me know about the optimization on Apple. After consulting the Software optimization guide, it looks like a `mov w, #0` is also a zero cycle move along with the `movw w, zr` instruction on V1/N2/V2 and is a normal ALU op on N1. Makes sense to eliminate the instruction. I will make the changes you suggested in next PS. Thanks. I know from personal experience that it's hard to resist every tiny optimization, but some things aren't worth it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2278706946 From duke at openjdk.org Fri Aug 15 10:33:37 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 15 Aug 2025 10:33:37 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v27] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: - one more round of updates per recieved suggestions from reviewers. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/3fd5388c..81356c2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=25-26 Stats: 14 lines in 4 files changed: 3 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Fri Aug 15 10:36:18 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 15 Aug 2025 10:36:18 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v22] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 06:05:15 GMT, Fei Yang wrote: >> Based on above experiments it looks reasonable to use `m2` grouping. > >> Based on above experiments it looks reasonable to use `m2` grouping. > > Thanks for the extra JMH numbers. Yes, I agree that `m2` is more reasonable here. > That means we won't need to reserve so many vector registers for `instruct varrays_hashcode` in src/hotspot/cpu/riscv/riscv_v.ad. > So can you free the unused vector registers? Will take a more closer look after that. Thanks for your review @RealFYang, corrected per your suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3191214041 From bkilambi at openjdk.org Fri Aug 15 11:54:59 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 15 Aug 2025 11:54:59 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: > After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - > `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - > > > public void vectorAddConstInputFloat16() { > for (int i = 0; i < LEN; ++i) { > output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); > } > } > > > > > > The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. > > This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). > > Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Addressed review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26589/files - new: https://git.openjdk.org/jdk/pull/26589/files/3a12ca00..278ada47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=05-06 Stats: 6 lines in 1 file changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26589/head:pull/26589 PR: https://git.openjdk.org/jdk/pull/26589 From bkilambi at openjdk.org Fri Aug 15 11:55:00 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 15 Aug 2025 11:55:00 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v6] In-Reply-To: References: <4LDAfuCcIvT0Q51SGfwW9VP-3iJZCi7LDT6GAtj8b4o=.d0033a4e-3e18-4581-ae7b-84d89ec808e8@github.com> Message-ID: On Fri, 15 Aug 2025 09:46:43 GMT, Andrew Haley wrote: >> Thanks for letting me know about the optimization on Apple. After consulting the Software optimization guide, it looks like a `mov w, #0` is also a zero cycle move along with the `movw w, zr` instruction on V1/N2/V2 and is a normal ALU op on N1. Makes sense to eliminate the instruction. I will make the changes you suggested in next PS. > > Thanks. I know from personal experience that it's hard to resist every tiny optimization, but some things aren't worth it. Could you please take another look at the patch? Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2278860766 From never at openjdk.org Fri Aug 15 16:52:16 2025 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 15 Aug 2025 16:52:16 GMT Subject: RFR: 8278874: tighten VerifyStack constraints [v12] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 19:42:31 GMT, Dean Long wrote: >> The VerifyStack logic in Deoptimization::unpack_frames() attempts to check the expression stack size of the interpreter frame against what GenerateOopMap computes. To do this, it needs to know if the state at the current bci represents the "before" state, meaning the bytecode will be reexecuted, or the "after" state, meaning we will advance to the next bytecode. The old code didn't know how to determine exactly what state we were in, so it checked both. This PR cleans that up, so we only have to compute the oopmap once. It also removes old SPARC support. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > cleanup Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26121#pullrequestreview-3124530874 From dlong at openjdk.org Fri Aug 15 18:51:12 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 15 Aug 2025 18:51:12 GMT Subject: RFR: 8278874: tighten VerifyStack constraints [v12] In-Reply-To: <4_y_xyPtpT-qA6cHEOkNnMATmZzIDEcSx7omfEIiCZc=.f40168dc-ec95-4eac-8823-a734f8b1ec1a@github.com> References: <4_y_xyPtpT-qA6cHEOkNnMATmZzIDEcSx7omfEIiCZc=.f40168dc-ec95-4eac-8823-a734f8b1ec1a@github.com> Message-ID: On Fri, 15 Aug 2025 06:38:58 GMT, Manuel H?ssig wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup > > Marked as reviewed by mhaessig (Committer). Thanks @mhaessig and @tkrodriguez for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26121#issuecomment-3192430004 From dlong at openjdk.org Fri Aug 15 18:51:13 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 15 Aug 2025 18:51:13 GMT Subject: RFR: 8278874: tighten VerifyStack constraints [v12] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 19:42:31 GMT, Dean Long wrote: >> The VerifyStack logic in Deoptimization::unpack_frames() attempts to check the expression stack size of the interpreter frame against what GenerateOopMap computes. To do this, it needs to know if the state at the current bci represents the "before" state, meaning the bytecode will be reexecuted, or the "after" state, meaning we will advance to the next bytecode. The old code didn't know how to determine exactly what state we were in, so it checked both. This PR cleans that up, so we only have to compute the oopmap once. It also removes old SPARC support. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > cleanup tier10 Graal results look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26121#issuecomment-3192430262 From dlong at openjdk.org Fri Aug 15 18:56:18 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 15 Aug 2025 18:56:18 GMT Subject: Integrated: 8278874: tighten VerifyStack constraints In-Reply-To: References: Message-ID: On Thu, 3 Jul 2025 20:28:34 GMT, Dean Long wrote: > The VerifyStack logic in Deoptimization::unpack_frames() attempts to check the expression stack size of the interpreter frame against what GenerateOopMap computes. To do this, it needs to know if the state at the current bci represents the "before" state, meaning the bytecode will be reexecuted, or the "after" state, meaning we will advance to the next bytecode. The old code didn't know how to determine exactly what state we were in, so it checked both. This PR cleans that up, so we only have to compute the oopmap once. It also removes old SPARC support. This pull request has now been integrated. Changeset: 39a36529 Author: Dean Long URL: https://git.openjdk.org/jdk/commit/39a365296882b0df49398cd7ac36e801a9aa1c35 Stats: 244 lines in 4 files changed: 113 ins; 68 del; 63 mod 8278874: tighten VerifyStack constraints Co-authored-by: Tom Rodriguez Reviewed-by: mhaessig, never ------------- PR: https://git.openjdk.org/jdk/pull/26121 From fyang at openjdk.org Sat Aug 16 02:08:19 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 16 Aug 2025 02:08:19 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v27] In-Reply-To: References: Message-ID: On Fri, 15 Aug 2025 10:33:37 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > - one more round of updates per recieved suggestions from reviewers. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2039: > 2037: beqz(t0, SCALAR_TAIL); > 2038: > 2039: vsetvli(t1, x0, Assembler::e32, Assembler::m2); This `vsetvli` doesn't seem necessary to me. Maybe we can remove it and move the second `vsetvli` at L2046 here? I am suggesting this code sequence: andi(t0, cnt, ~(stride - 1)); beqz(t0, SCALAR_TAIL); la(t1, ExternalAddress(adr_pows31)); lw(pow31_highest, Address(t1, -1 * sizeof(jint))); vsetvli(consumed, cnt, Assembler::e32, Assembler::m2); vle32_v(v_coeffs, t1); // 31^^(stride - 1) ... 31^^0 vmv_v_x(v_sum, x0); src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2044: > 2042: la(t1, ExternalAddress(adr_pows31)); > 2043: lw(pow31_highest, Address(t1, -1 * sizeof(jint))); > 2044: vle32_v(v_coeffs, t1); // 31^^(MaxVectorSize-1)...31^^0 The code comment doesn't seem accurate considering vector register grouping. Should it be: `31^^(stride - 1) ... 31^^0` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2280128427 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2280127812 From kvn at openjdk.org Sun Aug 17 22:55:24 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 17 Aug 2025 22:55:24 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> On Thu, 14 Aug 2025 14:35:53 GMT, Emanuel Peter wrote: >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. >> - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. >> - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` >> - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + Conv... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more documentation for Christian I have few comments src/hotspot/share/opto/c2_globals.hpp line 367: > 365: \ > 366: product(bool, UseAutoVectorizationPredicate, true, DIAGNOSTIC, \ > 367: "Use AutoVectorization predicate (for speculative compilation)") \ I do not see benchmarks results with this flag off. src/hotspot/share/opto/c2_globals.hpp line 370: > 368: \ > 369: product(bool, UseAutoVectorizationSpeculativeAliasingChecks, true, DIAGNOSTIC, \ > 370: "Use Multiversioning or Predicate to add aliasing runtime checks") \ This flag description implies that it should depend on `LoopMultiversioning` and `UseAutoVectorizationPredicate` flags settings but I did not find such checks. src/hotspot/share/opto/loopUnswitch.cpp line 569: > 567: // optimizations. That means the slow_loop should still be correct, but > 568: // a bit slower, as there is no unrolling etc. > 569: if (!LoopMultiversioningOptimizeSlowLoop) { Do we really need this in product? Your benchmarks results shows that we need to optimize slow loop. I can bet nobody will use this flag in real word. src/hotspot/share/opto/mempointer.hpp line 403: > 401: // Given: > 402: // (C0) pointer p and its MemPointer mp, which is constructed with safe decompositions. > 403: // (C1) a summand "scale_v * v" that occurs in mp. What is `v` here? And related `scale_v`? In previous text you used `scale_i * variable_i`. Is it the same. src/hotspot/share/opto/mempointer.hpp line 404: > 402: // (C0) pointer p and its MemPointer mp, which is constructed with safe decompositions. > 403: // (C1) a summand "scale_v * v" that occurs in mp. > 404: // (C2) a strided range r = [lo, lo + stride_v, .. hi] for v. If `hi` inclusive or exclusive? src/hotspot/share/opto/mempointer.hpp line 406: > 404: // (C2) a strided range r = [lo, lo + stride_v, .. hi] for v. > 405: // (C3) for all v in this strided range r we know that p is within bounds of its memory object. > 406: // (C4) abs(scale_v * stride_v) < 2^31. (C4) Is confusing if you read it first. But late I see `mp(v1) = mp(v0) + scale_v * stride_v` expression and now I know why you need this. src/hotspot/share/opto/mempointer.hpp line 444: > 442: // = summand_rest + scale_v * (v0 + stride_v) + con > 443: // = summand_rest + scale_v * v0 + scale_v * stride_v + con > 444: // = summand_rest + scale_v * v0 + scale_v * stride_v + con the same 2 lines src/hotspot/share/opto/mempointer.hpp line 674: > 672: // to be as simple as possible. For example, the pointer: > 673: // > 674: // pointer = base + 2L * ConvI2L(i + 4 * j + con1) + con2 Is this `MemPointerRawSummand` form? src/hotspot/share/opto/predicates.hpp line 48: > 46: * above which Regular Predicates can be created later after parsing. > 47: * > 48: * There are initially four Parse Predicates for each loop: You listed 5 parse predicates src/hotspot/share/opto/vectorization.cpp line 491: > 489: // any iv value in the strided range r = [init, init + iv_stride, .. limit). > 490: // > 491: // for all iv in r: p1(iv) + size1 <= p2(iv) OR p2(iv) + size2 <= p1(iv) What are `size*` ? src/hotspot/share/opto/vectorization.cpp line 515: > 513: // pointer p: > 514: // (C0) is given by the construction of VPointer vp, which simply wraps a MemPointer mp. > 515: // (c1) with v = iv and scale_v = iv_scale You have explanation here but not in `mempointer.hpp` ------------- PR Review: https://git.openjdk.org/jdk/pull/24278#pullrequestreview-3126462478 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281025986 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281025783 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281021994 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281030830 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281035285 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281036905 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281040501 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281043402 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281045284 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281048563 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281047772 From kvn at openjdk.org Sun Aug 17 22:55:25 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 17 Aug 2025 22:55:25 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v10] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Thu, 14 Aug 2025 14:33:03 GMT, Emanuel Peter wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/c2_globals.hpp >> >> Co-authored-by: Christian Hagedorn >> - improve predicates.hpp documentation > > @chhagedorn Thanks for the drive-by comments about the Predicate documentation. Are you now satisfied? Maybe @rwestrel should have a look at it too, since I completed the missing documentation from the Short-Running-Long-Loop-Predicates as well. @eme64 did you measure how much C2 compilation time changed with these changes (all optimizations enabled)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3194704275 From kvn at openjdk.org Sun Aug 17 22:55:26 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 17 Aug 2025 22:55:26 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: <9eaqaAMwqh04ds8lO3m6Mb44v8HhoO-i_Y9ORndDSj8=.583b6941-8a7a-4d56-8eae-e69c3ed51c6a@github.com> On Sun, 17 Aug 2025 21:42:01 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation for Christian > > src/hotspot/share/opto/mempointer.hpp line 403: > >> 401: // Given: >> 402: // (C0) pointer p and its MemPointer mp, which is constructed with safe decompositions. >> 403: // (C1) a summand "scale_v * v" that occurs in mp. > > What is `v` here? And related `scale_v`? In previous text you used `scale_i * variable_i`. Is it the same. Based on new comment at line 44 this is induction variable (iv). Consider explaining that `v` is induction variable to avoid confusion. I first thought it represents all kind of variables used in index expressions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281033664 From qxing at openjdk.org Mon Aug 18 01:38:44 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Mon, 18 Aug 2025 01:38:44 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v3] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 07:53:39 GMT, Qizheng Xing wrote: >> In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. >> >> loop-safepoint >> >> This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). >> >> >> Benchmark Mode Cnt Score Error Units >> LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline >> LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch >> >> >> Testing: tier1-2 on x86_64 and aarch64. > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Improve documentation comments Hi all, This patch has now passed all GHA tests and is ready for further reviews. If there are any other suggestions for this PR, please let me know. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-3194839054 From chagedorn at openjdk.org Mon Aug 18 06:16:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 18 Aug 2025 06:16:26 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v10] In-Reply-To: References: Message-ID: <4JJUFVogUIc2r8Pe_MjaRJlDgSStO7WwYY3iM2Eyjvk=.7c1cfa49-982c-4f4f-b77f-660f09f49714@github.com> On Fri, 15 Aug 2025 09:42:11 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. >> >> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. >> >> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. >> >> Testing: >> - [x] Github Actions >> - [x] tier1, tier2 on all platforms >> - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug >> - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > missed a dash Looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26023#pullrequestreview-3126909638 From chagedorn at openjdk.org Mon Aug 18 06:16:28 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 18 Aug 2025 06:16:28 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v8] In-Reply-To: <6-poNTHw7LVDOcv91ZprJQFTb0nAJbAtxxMwp8vtPTg=.0a80771c-c23d-4f99-ab2e-c6392798d328@github.com> References: <6-poNTHw7LVDOcv91ZprJQFTb0nAJbAtxxMwp8vtPTg=.0a80771c-c23d-4f99-ab2e-c6392798d328@github.com> Message-ID: On Fri, 15 Aug 2025 08:27:50 GMT, Manuel H?ssig wrote: >> src/hotspot/os/linux/compilerThreadTimeout_linux.cpp line 44: >> >>> 42: CompileTask* task = CompilerThread::current()->task(); >>> 43: assert(false, "compile task %d (%s) timed out after " INTPTR_FORMAT " ms", >>> 44: task->compile_id(), task->method()->name_and_sig_as_C_string(), CompileTaskTimeout); >> >> Normally, you would probably need a `ResourceMark` for getting the method name. However, I'm not sure if you can do that as well here inside the signal handler. Maybe someone else can comment on that. > > Does this really matter when we are doing it right before crashing the VM? I'm not entirely sure but I guess it's fine since it's in the same thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2281386549 From epeter at openjdk.org Mon Aug 18 06:16:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 06:16:35 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Sun, 17 Aug 2025 21:09:58 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation for Christian > > src/hotspot/share/opto/loopUnswitch.cpp line 569: > >> 567: // optimizations. That means the slow_loop should still be correct, but >> 568: // a bit slower, as there is no unrolling etc. >> 569: if (!LoopMultiversioningOptimizeSlowLoop) { > > Do we really need this in product? Your benchmarks results shows that we need to optimize slow loop. I can bet nobody will use this flag in real word. It is a DIAGNOSTIC flag, which allowed me to demonstrate the performance in a JMH benchmark. You asked for that benchmark back when I first introduced multiversioning with https://github.com/openjdk/jdk/pull/22016 . I'm also fine removing the flag completely now. Or just making it develop. What do you think is best? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281385887 From epeter at openjdk.org Mon Aug 18 06:20:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 06:20:28 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Sun, 17 Aug 2025 21:23:59 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation for Christian > > src/hotspot/share/opto/c2_globals.hpp line 370: > >> 368: \ >> 369: product(bool, UseAutoVectorizationSpeculativeAliasingChecks, true, DIAGNOSTIC, \ >> 370: "Use Multiversioning or Predicate to add aliasing runtime checks") \ > > This flag description implies that it should depend on `LoopMultiversioning` and `UseAutoVectorizationPredicate` flags settings but I did not find such checks. I made the description more precise. The idea is that you can disable the speculative checks with `UseAutoVectorizationSpeculativeAliasingChecks`. If you have the speculative checks enabled, you still need to enable multiversioning and/or the auto vectorization predicate - otherwise that also disables the speculative checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281394174 From epeter at openjdk.org Mon Aug 18 06:23:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 06:23:24 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Sun, 17 Aug 2025 21:24:39 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation for Christian > > src/hotspot/share/opto/c2_globals.hpp line 367: > >> 365: \ >> 366: product(bool, UseAutoVectorizationPredicate, true, DIAGNOSTIC, \ >> 367: "Use AutoVectorization predicate (for speculative compilation)") \ > > I do not see benchmarks results with this flag off. Would that be helpful to you? How? What would you expect to see here @vnkozlov ? This is really an optimization that reduces the code-size. Or do you just want to sanity-check that the peek performance would be identical if we use multiversioning rather than the predicate approach? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281399845 From epeter at openjdk.org Mon Aug 18 06:38:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 06:38:26 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <9eaqaAMwqh04ds8lO3m6Mb44v8HhoO-i_Y9ORndDSj8=.583b6941-8a7a-4d56-8eae-e69c3ed51c6a@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> <9eaqaAMwqh04ds8lO3m6Mb44v8HhoO-i_Y9ORndDSj8=.583b6941-8a7a-4d56-8eae-e69c3ed51c6a@github.com> Message-ID: On Sun, 17 Aug 2025 21:52:41 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/opto/mempointer.hpp line 403: >> >>> 401: // Given: >>> 402: // (C0) pointer p and its MemPointer mp, which is constructed with safe decompositions. >>> 403: // (C1) a summand "scale_v * v" that occurs in mp. >> >> What is `v` here? And related `scale_v`? In previous text you used `scale_i * variable_i`. Is it the same. > > Based on new comment at line 44 this is induction variable (iv). Consider explaining that `v` is induction variable to avoid confusion. I first thought it represents all kind of variables used in index expressions. I added some additional comments that should make it a bit more clear. The thing is that I don't want to use `iv` here already, because `MemPointer` can be used outside loops as well (e.g. `MergeStores`). Only `VPointer` have the concept of loops and `iv`. I try to keep those separated clearly. But I added a comment now that points to the application in `VPointer`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281423700 From epeter at openjdk.org Mon Aug 18 06:38:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 06:38:28 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Sun, 17 Aug 2025 21:58:35 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation for Christian > > src/hotspot/share/opto/mempointer.hpp line 404: > >> 402: // (C0) pointer p and its MemPointer mp, which is constructed with safe decompositions. >> 403: // (C1) a summand "scale_v * v" that occurs in mp. >> 404: // (C2) a strided range r = [lo, lo + stride_v, .. hi] for v. > > If `hi` inclusive or exclusive? Yes, I expanded the line a bit :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281424380 From epeter at openjdk.org Mon Aug 18 06:50:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 06:50:21 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Sun, 17 Aug 2025 22:04:50 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation for Christian > > src/hotspot/share/opto/mempointer.hpp line 406: > >> 404: // (C2) a strided range r = [lo, lo + stride_v, .. hi] for v. >> 405: // (C3) for all v in this strided range r we know that p is within bounds of its memory object. >> 406: // (C4) abs(scale_v * stride_v) < 2^31. > > (C4) Is confusing if you read it first. But late I see `mp(v1) = mp(v0) + scale_v * stride_v` expression and now I know why you need this. I see. I added two sentences just below to give a high level motivation for (C4), so the reader does not feel too confused on the first encounter ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281446014 From epeter at openjdk.org Mon Aug 18 06:53:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 06:53:21 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Sun, 17 Aug 2025 22:19:24 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation for Christian > > src/hotspot/share/opto/mempointer.hpp line 444: > >> 442: // = summand_rest + scale_v * (v0 + stride_v) + con >> 443: // = summand_rest + scale_v * v0 + scale_v * stride_v + con >> 444: // = summand_rest + scale_v * v0 + scale_v * stride_v + con > > the same 2 lines deleted! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281451967 From epeter at openjdk.org Mon Aug 18 06:56:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 06:56:22 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Sun, 17 Aug 2025 22:29:21 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation for Christian > > src/hotspot/share/opto/mempointer.hpp line 674: > >> 672: // to be as simple as possible. For example, the pointer: >> 673: // >> 674: // pointer = base + 2L * ConvI2L(i + 4 * j + con1) + con2 > > Is this `MemPointerRawSummand` form? No, that was supposed to be C2 IR... though I suppose `MemPointerRawSummand` would also faithfully represent the same form, see below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281457341 From epeter at openjdk.org Mon Aug 18 07:06:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 07:06:23 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: <9qyj-TUNVLJsPVyprhqipEoNbGjVvNzlsSN3OATBV9s=.d5fdec30-e23b-4249-aef9-e00ee3c1d836@github.com> On Mon, 18 Aug 2025 06:53:24 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/mempointer.hpp line 674: >> >>> 672: // to be as simple as possible. For example, the pointer: >>> 673: // >>> 674: // pointer = base + 2L * ConvI2L(i + 4 * j + con1) + con2 >> >> Is this `MemPointerRawSummand` form? > > No, that was supposed to be C2 IR... though I suppose `MemPointerRawSummand` would also faithfully represent the same form, see below. I added some more comments to hopefully make it a bit more clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281472739 From epeter at openjdk.org Mon Aug 18 07:06:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 07:06:25 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: <-7RNt8eSLQS__9fI5IoluBfXjKvacC5OD1mRlygPAUo=.200f3863-b260-4920-9484-e8f49e61a11f@github.com> On Sun, 17 Aug 2025 22:36:27 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation for Christian > > src/hotspot/share/opto/predicates.hpp line 48: > >> 46: * above which Regular Predicates can be created later after parsing. >> 47: * >> 48: * There are initially four Parse Predicates for each loop: > > You listed 5 parse predicates Right, fixed it to five. I forgot it becuase I only later also realized that Roland's Long Running Long Loop Predicate was also missing from the list. > src/hotspot/share/opto/vectorization.cpp line 515: > >> 513: // pointer p: >> 514: // (C0) is given by the construction of VPointer vp, which simply wraps a MemPointer mp. >> 515: // (c1) with v = iv and scale_v = iv_scale > > You have explanation here but not in `mempointer.hpp` Now I do, thanks for the comment :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281474947 PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281475726 From epeter at openjdk.org Mon Aug 18 07:15:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 07:15:27 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Sun, 17 Aug 2025 22:48:48 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation for Christian > > src/hotspot/share/opto/vectorization.cpp line 491: > >> 489: // any iv value in the strided range r = [init, init + iv_stride, .. limit). >> 490: // >> 491: // for all iv in r: p1(iv) + size1 <= p2(iv) OR p2(iv) + size2 <= p1(iv) > > What are `size*` ? I added a reminder. You could also read it up in `mempointer.hpp` ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281495023 From epeter at openjdk.org Mon Aug 18 07:45:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 07:45:06 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v12] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <7w_mS7-wipF3Sel0qM9MJzEp9uCqtkkWiZXS8_zpJy8=.ec818d12-e617-4434-a189-a54e0acc5335@github.com> > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. > - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. > - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` > - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` > - For aliasing analysis (adjacency and overlap), the "regu... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: addressing Vladimir's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/0180dd27..1fc7caa0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=10-11 Stats: 47 lines in 4 files changed: 39 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Mon Aug 18 07:47:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 07:47:23 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v10] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <8oydcWWCxrLGTk74NqbUS5X97E6g-ZkU1El70fhClf4=.92d3f267-3e86-45b3-94b4-4020d05d5c7c@github.com> On Thu, 14 Aug 2025 14:33:03 GMT, Emanuel Peter wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/c2_globals.hpp >> >> Co-authored-by: Christian Hagedorn >> - improve predicates.hpp documentation > > @chhagedorn Thanks for the drive-by comments about the Predicate documentation. Are you now satisfied? Maybe @rwestrel should have a look at it too, since I completed the missing documentation from the Short-Running-Long-Loop-Predicates as well. > @eme64 did you measure how much C2 compilation time changed with these changes (all optimizations enabled)? I did not. I don't think it would take much extra time in almost all cases. The extra analysis is not that costly compared to unrolling that we do in all cases already. What might cost more: if we deopt because of the runtime check, and recompile with multiversioning. That could essencially double C2 compile time for those cases. Do you think it is worth it to benchmark now, or should be just rely on @robcasloz 's occasional benchmarking and address the issues if they come up? If you want me to do C2 time benchmarking: should I just show a few specific micro-benchmarks, or do you want to have statistics collected on larger benchmark suites? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3195515822 From epeter at openjdk.org Mon Aug 18 07:52:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 07:52:21 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: <_Kw3K2gEmjUgLy5pYLnMsKH2N-cb-cKfc2ip412MACU=.e354810a-6fa6-4f6b-8470-984040bf712b@github.com> On Mon, 18 Aug 2025 06:13:08 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopUnswitch.cpp line 569: >> >>> 567: // optimizations. That means the slow_loop should still be correct, but >>> 568: // a bit slower, as there is no unrolling etc. >>> 569: if (!LoopMultiversioningOptimizeSlowLoop) { >> >> Do we really need this in product? Your benchmarks results shows that we need to optimize slow loop. I can bet nobody will use this flag in real word. > > It is a DIAGNOSTIC flag, which allowed me to demonstrate the performance in a JMH benchmark. You asked for that benchmark back when I first introduced multiversioning with https://github.com/openjdk/jdk/pull/22016 . I'm also fine removing the flag completely now. Or just making it develop. What do you think is best? It is now used in the JMH benchmark, I'd have to remove it there too: https://github.com/openjdk/jdk/pull/24278/files#diff-93288fabe20d76b9df3fb5601e4d8600a46f438fe4b9c4ef92d702fdffa1c8c9R225-R230 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2281574475 From mhaessig at openjdk.org Mon Aug 18 08:00:19 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 18 Aug 2025 08:00:19 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v8] In-Reply-To: References: <6-poNTHw7LVDOcv91ZprJQFTb0nAJbAtxxMwp8vtPTg=.0a80771c-c23d-4f99-ab2e-c6392798d328@github.com> Message-ID: On Mon, 18 Aug 2025 06:13:36 GMT, Christian Hagedorn wrote: >> Does this really matter when we are doing it right before crashing the VM? > > I'm not entirely sure but I guess it's fine since it's in the same thread. Maybe @dean-long can shed some light on this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2281593869 From epeter at openjdk.org Mon Aug 18 08:02:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 08:02:19 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v6] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 07:02:58 GMT, Manuel H?ssig wrote: >> A loop of the form >> >> MemorySegment ms = {}; >> for (long i = 0; i < ms.byteSize() / 8L; i++) { >> // vectorizable work >> } >> >> does not vectorize, whereas >> >> MemorySegment ms = {}; >> long size = ms.byteSize(); >> for (long i = 0; i < size / 8L; i++) { >> // vectorizable work >> } >> >> vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: >> >> https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 >> >> Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. >> >> So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. >> >> @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem >> >> ## Change Description >> >> Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. >> >>
Explored Alternatives >> 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. >> 2. I... > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge branch 'master' into jdk-8356176-byte-size > - Emanuel's suggestion > - Better documentation of profitable() > - Remove vector sizes > - Specify vector sizes > - Merge branch 'jdk-8356176-byte-size' of github.com:mhaessig/jdk into jdk-8356176-byte-size > - Update field documentation > > Co-authored-by: Emanuel Peter > - Add asserts > - Make region a field > - Even more better debug print > - ... and 13 more: https://git.openjdk.org/jdk/compare/25480f00...025dbe6e 2 little nits left over ;) src/hotspot/share/opto/loopopts.cpp line 234: > 232: if (TraceLoopOpts) { > 233: tty->print("Split %s N%d through Phi N%d in %s N%d", > 234: n->Name(), n->_idx, phi->_idx, region->Name(), region->_idx); Ah, only just noticed it, and it is an absolute nit. To keep it consistent with `node->dump()`, I'd suggest the `idx` should be before the `Name`. We do that already in other places. Suggestion: tty->print("Split %d %s through %d Phi in %d %s", n->_idx, n->Name(), phi->_idx, region->_idx, region->Name()); ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26429#pullrequestreview-3127219247 PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2281593360 From epeter at openjdk.org Mon Aug 18 08:02:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 08:02:20 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v6] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 15:27:12 GMT, Manuel H?ssig wrote: >> test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentByteSizeLongLoopLimit.java line 38: >> >>> 36: * @library /test/lib / >>> 37: * @run driver compiler.loopopts.superword.TestMemorySegmentByteSizeLongLoopLimit >>> 38: */ >> >> For MemorySegment tests, I've made the experience that it is quite important to test out some runs with additional flag combinations: at least `AlignVector` and `ShortRunningLongLoop`. Same might apply for the tests below. > > I added scenarios. Still missing `AlignVector`, no? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2281595785 From epeter at openjdk.org Mon Aug 18 08:02:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 08:02:21 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v6] In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 07:58:37 GMT, Emanuel Peter wrote: >> I added scenarios. > > Still missing `AlignVector`, no? At least in this specific test, you added it to another one below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2281596625 From bmaillard at openjdk.org Mon Aug 18 08:07:11 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 18 Aug 2025 08:07:11 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 14:38:01 GMT, Manuel H?ssig wrote: > This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus some internal testing on Oracle supported platforms Looks good to me, I only have one minor suggestion. test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 368: > 366: .flatMap(setElement -> crossProductHelper(idx + 1, sets) > 367: .map(set -> { > 368: Set newSet = new HashSet(set); Suggestion: Set newSet = new HashSet<>(set); You should use the diamond operator here to use the generic type instead of the raw `HashSet` type ------------- PR Review: https://git.openjdk.org/jdk/pull/26762#pullrequestreview-3127232286 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2281603073 From duke at openjdk.org Mon Aug 18 08:11:18 2025 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane Ampudia) Date: Mon, 18 Aug 2025 08:11:18 GMT Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v2] In-Reply-To: <5CGrcWjFZ7Zqj_Tm0LO6Tqg9cUA-xxvcaa2J-yWW8BE=.af4dea7c-e39d-491d-b924-c89fa82e757a@github.com> References: <5CGrcWjFZ7Zqj_Tm0LO6Tqg9cUA-xxvcaa2J-yWW8BE=.af4dea7c-e39d-491d-b924-c89fa82e757a@github.com> Message-ID: <1Tu9HNRsw3SBmtk0ynLVLY2eRilJC30gcYeo8rtpbY8=.2896c35b-7224-4ee9-8102-1763588eae0e@github.com> On Thu, 14 Aug 2025 10:54:08 GMT, Damon Fenacci wrote: >> # Issue >> While compiling `java.util.zip.ZipFile` in C2 this assert is triggered >> https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235 >> >> # Cause >> While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens: >> * we insert a trailing `MemBarStoreStore` in the constructor >> before_folding >> >> * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. >> after_folding >> >> * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302 >> * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235 >> triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier >> >> The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped). >> >> # Fix >> Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation. >> >> # Testing >> Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after. >> Tier 1-3+ tests passed. > > Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8360031 > - JDK-8360031: update assert message > - Merge branch 'master' into JDK-8360031 > - JDK-8360031: remove unnecessary include > - JDK-8360031: remove UseNewCode > - JDK-8360031: compilation asserts in MemBarNode::remove Hi! Wanted to mention this might be related to the following: [JDK-8330062](https://bugs.openjdk.org/browse/JDK-8330062), which I'm giving a look at the moment ------------- PR Comment: https://git.openjdk.org/jdk/pull/26556#issuecomment-3195582310 From mhaessig at openjdk.org Mon Aug 18 08:12:02 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 18 Aug 2025 08:12:02 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v2] In-Reply-To: References: Message-ID: <4Jk_phoDCxNS3QzYjbUnlmgpuZmnTI_f9j5_ORDlrOU=.d66384f5-a0cb-4f6a-8ccf-533fd6eca0e3@github.com> > This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus some internal testing on Oracle supported platforms Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Apply Beno?t's suggestion Co-authored-by: Beno?t Maillard ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26762/files - new: https://git.openjdk.org/jdk/pull/26762/files/57b3afed..0bd8c6a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26762.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26762/head:pull/26762 PR: https://git.openjdk.org/jdk/pull/26762 From snatarajan at openjdk.org Mon Aug 18 08:14:20 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 18 Aug 2025 08:14:20 GMT Subject: RFR: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled [v4] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 12:10:54 GMT, Saranya Natarajan wrote: >> **Issue** >> An error, `assert(data->is_ReceiverTypeData()) failed: bad profile data type`, is encountered during C2 compilation due to bad profile data. This occurs when the code is compiled with `TypeProfileCasts` option disabled. >> >> **Analysis** >> The assertion failure occurs in `record_profiled_receiver_for_speculation` that analyzes the profiling information in the method data to determine whether a null value has been observed in the `instanceof` operation. This information is encoded in the `BitData` during profiling. When the method identifies that a null has been seen, it proceeds to inspect the associated `ReceiverTypeData` to see if the type check is always performed against null. However, in this scenario, the incoming profiling data is of type `BitData` rather than `ReceiverTypeData`, leading to the assertion failure. >> >> The profiling information for null seen for operations `aastore`, `instanceof`, and `checkcast` is recorded by the method `profile_null_seen `(in` src/hotspot/cpu/x86/templateTable_x86.cpp `). On investigating this method, it can be observed that the method data pointer is not updated for `VirtualCallData` (which is a subclass of `ReceiverTypeData`) when the `TypeProfileCasts` option is disabled. >> >> **Solution** >> My proposal is to inspect the `ReceiverTypeData` in function `record_profiled_receiver_for_speculation` only if `TypeProfileCasts` is enabled (this is based on the fact that the relevant method data pointer is not updated when `TypeProfileCasts` is disabled). >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? >> >> **Testing** >> GitHub Actions >> tier1 to tier3 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > > Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: > > - formating code > - add CompileThresholdScaling Thank you for the review. Please sponsor ------------- PR Comment: https://git.openjdk.org/jdk/pull/26640#issuecomment-3195593331 From snatarajan at openjdk.org Mon Aug 18 08:19:24 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 18 Aug 2025 08:19:24 GMT Subject: Integrated: 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled In-Reply-To: References: Message-ID: On Tue, 5 Aug 2025 10:40:19 GMT, Saranya Natarajan wrote: > **Issue** > An error, `assert(data->is_ReceiverTypeData()) failed: bad profile data type`, is encountered during C2 compilation due to bad profile data. This occurs when the code is compiled with `TypeProfileCasts` option disabled. > > **Analysis** > The assertion failure occurs in `record_profiled_receiver_for_speculation` that analyzes the profiling information in the method data to determine whether a null value has been observed in the `instanceof` operation. This information is encoded in the `BitData` during profiling. When the method identifies that a null has been seen, it proceeds to inspect the associated `ReceiverTypeData` to see if the type check is always performed against null. However, in this scenario, the incoming profiling data is of type `BitData` rather than `ReceiverTypeData`, leading to the assertion failure. > > The profiling information for null seen for operations `aastore`, `instanceof`, and `checkcast` is recorded by the method `profile_null_seen `(in` src/hotspot/cpu/x86/templateTable_x86.cpp `). On investigating this method, it can be observed that the method data pointer is not updated for `VirtualCallData` (which is a subclass of `ReceiverTypeData`) when the `TypeProfileCasts` option is disabled. > > **Solution** > My proposal is to inspect the `ReceiverTypeData` in function `record_profiled_receiver_for_speculation` only if `TypeProfileCasts` is enabled (this is based on the fact that the relevant method data pointer is not updated when `TypeProfileCasts` is disabled). > > **Question to reviewers** > Do you think this is a reasonable fix ? > > **Testing** > GitHub Actions > tier1 to tier3 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. This pull request has now been integrated. Changeset: 2b756ab1 Author: Saranya Natarajan Committer: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/2b756ab1e8cfacc5cf5d9c6dfdf1d1c9a6ecf4b1 Stats: 60 lines in 2 files changed: 52 ins; 1 del; 7 mod 8358781: C2 fails with assert "bad profile data type" when TypeProfileCasts is disabled Reviewed-by: mhaessig, kvn, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/26640 From mhaessig at openjdk.org Mon Aug 18 08:23:59 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 18 Aug 2025 08:23:59 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v7] In-Reply-To: References: Message-ID: > A loop of the form > > MemorySegment ms = {}; > for (long i = 0; i < ms.byteSize() / 8L; i++) { > // vectorizable work > } > > does not vectorize, whereas > > MemorySegment ms = {}; > long size = ms.byteSize(); > for (long i = 0; i < size / 8L; i++) { > // vectorizable work > } > > vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: > > https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 > > Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. > > So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. > > @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem > > ## Change Description > > Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. > >
Explored Alternatives > 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. > 2. Insert a "`PHASEIDEALLOOP0`" with `LoopOptsNone` that only perfor... Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Apply Emmanuel's printing suggestion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26429/files - new: https://git.openjdk.org/jdk/pull/26429/files/025dbe6e..5d2d63b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26429.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26429/head:pull/26429 PR: https://git.openjdk.org/jdk/pull/26429 From duke at openjdk.org Mon Aug 18 08:37:06 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 18 Aug 2025 08:37:06 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v28] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: - minor updates requested by reviewer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/81356c2e..38ae6629 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=26-27 Stats: 6 lines in 1 file changed: 2 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From mhaessig at openjdk.org Mon Aug 18 08:40:05 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 18 Aug 2025 08:40:05 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v8] In-Reply-To: References: Message-ID: > A loop of the form > > MemorySegment ms = {}; > for (long i = 0; i < ms.byteSize() / 8L; i++) { > // vectorizable work > } > > does not vectorize, whereas > > MemorySegment ms = {}; > long size = ms.byteSize(); > for (long i = 0; i < size / 8L; i++) { > // vectorizable work > } > > vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: > > https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 > > Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. > > So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. > > @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem > > ## Change Description > > Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. > >
Explored Alternatives > 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. > 2. Insert a "`PHASEIDEALLOOP0`" with `LoopOptsNone` that only perfor... Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Add missing AlignVector ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26429/files - new: https://git.openjdk.org/jdk/pull/26429/files/5d2d63b4..cdd17911 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=06-07 Stats: 11 lines in 1 file changed: 5 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/26429.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26429/head:pull/26429 PR: https://git.openjdk.org/jdk/pull/26429 From mhaessig at openjdk.org Mon Aug 18 08:40:06 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 18 Aug 2025 08:40:06 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v6] In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 07:59:35 GMT, Emanuel Peter wrote: > 2 little nits left over ;) I addressed both of them. Thank you for pointing them out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26429#issuecomment-3195683022 From snatarajan at openjdk.org Mon Aug 18 08:41:21 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 18 Aug 2025 08:41:21 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v8] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: On Wed, 13 Aug 2025 09:35:08 GMT, Saranya Natarajan wrote: >> **Issue** >> Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. >> >> **Analysis** >> On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. >> >> **Proposal** >> Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. >> >> **Issue in AArch64** >> Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? >> >> **Request to port maintainers** >> @dafedafe suggested that we keep the upper boun... > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > additions for linux-riscv64 @offamitkumar and @TheRealMDoerr : Would it be possible to test this PR in s390 and PPC ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3195691469 From mchevalier at openjdk.org Mon Aug 18 08:41:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 18 Aug 2025 08:41:52 GMT Subject: RFR: 8360561: PhaseIdealLoop::create_new_if_for_predicate hits "must be a uct if pattern" assert [v2] In-Reply-To: References: Message-ID: > Did you know that ranges can be disjoints and yet not ordered?! Well, in modular arithmetic. > > Let's look at a simplistic example: > > int x; > if (?) { > x = -1; > } else { > x = 1; > } > > if (x != 0) { > return; > } > // Unreachable > > > With signed ranges, before the second `if`, `x` is in `[-1, 1]`. Which is enough to enter to second if, but not enough to prove you have to enter it: it wrongly seems that after the second `if` is still reachable. Twaddle! > > With unsigned ranges, at this point `x` is in `[1, 2^32-1]`, and then, it is clear that `x != 0`. This information is used to refine the value of `x` in the (missing) else-branch, and so, after the if. This is done with simple lattice meet (Hotspot's join): in the else-branch, the possible values of `x` are the meet of what is was worth before, and the interval in the guard, that is `[0, 0]`. Thanks to the unsigned range, this is known to be empty (that is bottom, or Hotspot's top). And with a little reduced product, the whole type of `x` is empty as well. Yet, this information is not used to kill control yet. > > This is here the center of the problem: we have a situation such as: > 2 after-CastII > After node `110 CastII` is idealized, it is found to be Top, and then the uncommon trap at `129` is replaced by `238 Halt` by being value-dead. > 1 before-CastII > Since the control is not killed, the node stay there, eventually making some predicate-related assert fail as a trap is expected under a `ParsePredicate`. > > And that's what this change proposes: when comparing integers with non-ordered ranges, let's see if the unsigned ranges overlap, by computing the meet. If the intersection is empty, then the values can't be equals, without being able to order them. This is new! Without unsigned information for signed integer, either they overlap, or we can order them. Adding modular arithmetic allows to have non-overlapping ranges that are also not ordered. > > Let's also notice that 0 is special: it is important bounds are on each side of 0 (or 2^31, the other discontinuity). For instance if `x` can be 1 or 5, for instance, both the signed and unsigned range will agree on `[1, 5]` and not be able to prove it's, let's say, 3. > > What would there be other ways to treat this problem a bit ... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Use Warmup(0) instead of Xcomp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26504/files - new: https://git.openjdk.org/jdk/pull/26504/files/717af8de..feef30f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26504&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26504&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26504/head:pull/26504 PR: https://git.openjdk.org/jdk/pull/26504 From mchevalier at openjdk.org Mon Aug 18 08:41:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 18 Aug 2025 08:41:52 GMT Subject: RFR: 8360561: PhaseIdealLoop::create_new_if_for_predicate hits "must be a uct if pattern" assert [v2] In-Reply-To: References: Message-ID: <-PmfO6zD_ze3so_8escy8pAX149XX2tI45k4_du7oeM=.d505a9b4-759e-4141-8c9e-3a342061511b@github.com> On Thu, 14 Aug 2025 11:16:31 GMT, Christian Hagedorn wrote: >> Indeed. I use it here to prevent profiling from removing an actually impossible path with a trap, because bad things happen in a dead path. It's not the first time I use `Xcomp` for that, and there are other ways (like setting a maximum on the number of traps per method, or disabling the warmup (and so profiling) in IR framework execution). That was discussed in some other PR without strong opinions or consensus on what would be the preferred way. > > Ideally you use `@Warmup(0)` without `-Xcomp` + `CompileOnly` to not stress the test VM unnecessarily. But depending on your use case/profiling requirements, it might not be enough, so `-Xcomp` + `CompileOnly` seems like a good option. Seems that it works in this case. I've changed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26504#discussion_r2281688100 From duke at openjdk.org Mon Aug 18 08:42:19 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 18 Aug 2025 08:42:19 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v27] In-Reply-To: References: Message-ID: On Sat, 16 Aug 2025 02:00:06 GMT, Fei Yang wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> - one more round of updates per recieved suggestions from reviewers. > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2039: > >> 2037: beqz(t0, SCALAR_TAIL); >> 2038: >> 2039: vsetvli(t1, x0, Assembler::e32, Assembler::m2); > > This `vsetvli` doesn't seem necessary to me. Maybe we can remove it and move the second `vsetvli` at L2046 here? > I am suggesting this code sequence: > > andi(t0, cnt, ~(stride - 1)); > beqz(t0, SCALAR_TAIL); > > la(t1, ExternalAddress(adr_pows31)); > lw(pow31_highest, Address(t1, -1 * sizeof(jint))); > > vsetvli(consumed, cnt, Assembler::e32, Assembler::m2); > vle32_v(v_coeffs, t1); // 31^^(stride - 1) ... 31^^0 > vmv_v_x(v_sum, x0); I like it, thanks. > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2044: > >> 2042: la(t1, ExternalAddress(adr_pows31)); >> 2043: lw(pow31_highest, Address(t1, -1 * sizeof(jint))); >> 2044: vle32_v(v_coeffs, t1); // 31^^(MaxVectorSize-1)...31^^0 > > The code comment doesn't seem accurate considering vector register grouping. > Should it be: `31^^(stride - 1) ... 31^^0` ? Sure, done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2281693532 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r2281699222 From bmaillard at openjdk.org Mon Aug 18 08:47:10 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 18 Aug 2025 08:47:10 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v2] In-Reply-To: <4Jk_phoDCxNS3QzYjbUnlmgpuZmnTI_f9j5_ORDlrOU=.d66384f5-a0cb-4f6a-8ccf-533fd6eca0e3@github.com> References: <4Jk_phoDCxNS3QzYjbUnlmgpuZmnTI_f9j5_ORDlrOU=.d66384f5-a0cb-4f6a-8ccf-533fd6eca0e3@github.com> Message-ID: <48EpHvdE22AyXWHiY2KzWnvOWbTRoeLqmr5TOMf_ddo=.d7a2fd04-bc01-4e68-b2d6-fa4d2ccc8573@github.com> On Mon, 18 Aug 2025 08:12:02 GMT, Manuel H?ssig wrote: >> This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. >> >> Testing: >> - [x] Github Actions >> - [x] tier1,tier2 plus some internal testing on Oracle supported platforms > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Apply Beno?t's suggestion > > Co-authored-by: Beno?t Maillard Marked as reviewed by bmaillard (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/26762#pullrequestreview-3127386411 From jsjolen at openjdk.org Mon Aug 18 09:47:15 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 18 Aug 2025 09:47:15 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers [v2] In-Reply-To: References: Message-ID: <1ZGeH-R9goJByTfkQSiSKp1nD9oxNqOkeG50T5rnJuI=.4cb38ce6-eac2-42fc-ad4d-771758bd4d84@github.com> > Hi, > > This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. > > Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. > > Testing: GHA Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Good catch by Vladimir - Vladimir's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26569/files - new: https://git.openjdk.org/jdk/pull/26569/files/c5ea4184..e71b4924 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26569&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26569&range=00-01 Stats: 7 lines in 3 files changed: 3 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26569.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26569/head:pull/26569 PR: https://git.openjdk.org/jdk/pull/26569 From bkilambi at openjdk.org Mon Aug 18 12:01:12 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 18 Aug 2025 12:01:12 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 13:01:57 GMT, Andrew Haley wrote: >>> `Why not do something along these lines?` >> >> I tried exactly that and it does generate a `mov` and a `dup` for illegal immediates which is why I initially said I would put up a patch soon but I realised later that the `loadConH` node is also being generated somewhere above (most likely because the value it loads is required for the scalar `AddHF` nodes). This isn't ideal? As we wanted to get rid of the load from the constant pool in the first place, if I got you right? >> >>> I don't understand. >> >> Apologies for not being clear. >> Another approach I thought was to directly modify the `loadConH` itself. >> >> `loadConH` is defined as - >> >> instruct loadConH(vRegF dst, immH con) %{ >> match(Set dst con); >> format %{ >> "ldrs $dst, [$constantaddress]\t# load from constant table: half float=$con\n\t" >> %} >> ins_encode %{ >> __ ldrs(as_FloatRegister($dst$$reg), $constantaddress($con)); >> %} >> ins_pipe(fp_load_constant_s); >> %} >> >> >> The destination register is an FPR. If we would want to modify this to generate a move to a scratch register instead (something similar to loadConI) then we would have to change the destination register to `iregI` which probably could be acceptable for autovectorization as we are replicating the value in a vector register anyway but for the scalar `AddHF` operation (the iterations that get peeled or the ones in pre/post loop which are not autovectorized), it would expect the value to be available in an FPR instead (the `h` register variant). So we might have to introduce a move from the GPR to an FPR. The reason why I felt I needed more time to investigate this. Please let me know your thoughts. Thanks! > >> The destination register is an FPR. If we would want to modify this to generate a move to a scratch register instead (something similar to loadConI) then we would have to change the destination register to `iregI` > > This is the part I don't understand. Why would you have to change the destination register to `iregI`? I wouldn't. > > > instruct loadConH(vRegF dst, immH con) %{ > match(Set dst con); > format %{ > "something" > %} > ins_encode %{ > __ movw(rscratch1, $con$$constant); > __ fmovs($dst$$reg, rscratch1); > %} > ``` Hi @theRealAph could I please request for a re-review? Thanks! I would like to push this to JDK-25u as well as mainline. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3196351325 From aph at openjdk.org Mon Aug 18 13:11:13 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 18 Aug 2025 13:11:13 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: On Fri, 15 Aug 2025 11:54:59 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments There's something that I still do not understand. In your tests I see this: // For vectorizable loops containing FP16 operations with an FP16 constant as one of the inputs, the IR // node `(dst (Replicate con))` is generated to broadcast the constant into all lanes of an SVE register. // On SVE-capable hardware with vector length > 16B, if the FP16 immediate is a signed value within the // range [-128, 127] or a signed multiple of 256 in the range [-32768, 32512] for element widths of // 16 bits or higher then the backend should generate the "replicateHF_imm_gt128b" machnode. Why is this restricted to special constants? You should be able to do this with any value by generating `mov rtemp, #n; dup zn.h, rtemp`. There's no need to generate `mov rtemp, #n; fmov stemp, rtemp; dup zn.h, stemp` ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3196765207 From dskantz at openjdk.org Mon Aug 18 13:34:13 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Mon, 18 Aug 2025 13:34:13 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v3] In-Reply-To: References: Message-ID: <93TT-9mEaUlfvGdzHLOq70IBxak65QWoPSD0ve2wrBU=.67f53f35-6524-4da0-adfe-b7608f21d917@github.com> On Tue, 12 Aug 2025 12:50:46 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision: >> >> - comment >> - changes > > Thanks for addressing my comments, Daniel! Please re-test to ensure the new limit is OK on all Oracle's internal test configurations. Thanks @robcasloz for the review. Great to increase the upper bound with compilation time measurements. Tests still look good to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26685#issuecomment-3196907616 From dskantz at openjdk.org Mon Aug 18 13:34:14 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Mon, 18 Aug 2025 13:34:14 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v3] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 06:40:53 GMT, Daniel Skantz wrote: >> test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsMany.java line 28: >> >>> 26: * @bug 8357105 >>> 27: * @summary Test that repeated stacked string concatenations do not >>> 28: * consume too many compilation resources. >> >> Is there a reasonable way to enhance the test to validate excessive resources? I'm not sure if the following example would work, but I'm wondering if there is something that can be measured deterministically. E.g. before with the given test there would be ~N IR nodes produced but now it would be a max of ~M, assuming that M is deterministically smaller than N. > > There's a 80000 node limit by default and maybe the test could use a lower limit by specifying a value for the MaxNodeLimit flag. There is also the IR framework that can check for node counts for individual nodes. > > Without the fix, the test currently gets a MemLimit assert in debug runs for consuming 1GB of memory as it is building up the _arguments arrays. The high number of IR nodes is created later in `replace_string_concat` if we get that far without timing out or reaching the memory limit. I think the default node limit, memory limit and timeout for product runs may be enough to test the fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2282402493 From epeter at openjdk.org Mon Aug 18 13:36:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 13:36:15 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v8] In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 08:40:05 GMT, Manuel H?ssig wrote: >> A loop of the form >> >> MemorySegment ms = {}; >> for (long i = 0; i < ms.byteSize() / 8L; i++) { >> // vectorizable work >> } >> >> does not vectorize, whereas >> >> MemorySegment ms = {}; >> long size = ms.byteSize(); >> for (long i = 0; i < size / 8L; i++) { >> // vectorizable work >> } >> >> vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: >> >> https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 >> >> Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. >> >> So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. >> >> @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem >> >> ## Change Description >> >> Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. >> >>
Explored Alternatives >> 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. >> 2. I... > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Add missing AlignVector Thanks @mhaessig for the work! I hope this is a big step towards reducing the "brittleness" I was experiencing in my benchmarks/tests :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26429#pullrequestreview-3128393285 From epeter at openjdk.org Mon Aug 18 14:15:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Aug 2025 14:15:21 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v9] In-Reply-To: References: Message-ID: <4jse1CDroshO-rXRvZcTqrcR9yRFc1pEOG3buxHLbZ0=.22c2bb7e-7524-4b1f-8f74-2b22edee1639@github.com> On Fri, 8 Aug 2025 08:21:42 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: > > - Add microbench > - Add missing test method declarations I'll run some testing and review afterward. test/hotspot/jtreg/compiler/c2/gvn/TestCountBitsRange.java line 34: > 32: * @summary Tests that count bits nodes are handled correctly. > 33: * @library /test/lib / > 34: * @requires vm.compiler2.enabled Is this restriction necessary? IR rules are only run if we have C2 available in debug anyway. Other modes could still profit from correctness checks. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3128565978 PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2282534728 From jsjolen at openjdk.org Mon Aug 18 14:26:13 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 18 Aug 2025 14:26:13 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers In-Reply-To: References: Message-ID: On Thu, 31 Jul 2025 19:00:10 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. >> >> Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. >> >> Testing: GHA > > The build failures are all from after [Explicitly assign _mutable_data to nullptr](https://github.com/openjdk/jdk/pull/26569/commits/75a3853b65f264666c470a3ba6b1791dce6c775d), fixing the issues should be trivial. > >> Is this change intended to resolve JDK-8361382 (NMT header corruption)? If so, please link it in the PR description and describe how the new logic prevents that corruption. > > It's not intended to resolve it, but it does remove one potential source of the issue. > >> However, since relocation iteration is on a performance-critical path, benchmarks should be run to ensure that the added integer field and array indexing introduce no measurable regression. > > Yeah, we can check that. Note that we have the same size, as we replaced 1 8-byte field with 2 4-byte fields. I also suspect that the pointer addition (probably a `lea r0, [ r0 + r1 ]` on x64) won't introduce a performance regression, but nothing wrong with checking. > @jdksjolen please run tier1-4 testing in mach5, GHA is not enough for such changes. Two tests fail (rest are green) in my testing. I will ping you when I've solved those issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26569#issuecomment-3197152716 From kvn at openjdk.org Mon Aug 18 14:41:23 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 18 Aug 2025 14:41:23 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <_Kw3K2gEmjUgLy5pYLnMsKH2N-cb-cKfc2ip412MACU=.e354810a-6fa6-4f6b-8470-984040bf712b@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> <_Kw3K2gEmjUgLy5pYLnMsKH2N-cb-cKfc2ip412MACU=.e354810a-6fa6-4f6b-8470-984040bf712b@github.com> Message-ID: On Mon, 18 Aug 2025 07:49:22 GMT, Emanuel Peter wrote: >> It is a DIAGNOSTIC flag, which allowed me to demonstrate the performance in a JMH benchmark. You asked for that benchmark back when I first introduced multiversioning with https://github.com/openjdk/jdk/pull/22016 . I'm also fine removing the flag completely now. Or just making it develop. What do you think is best? > > It is now used in the JMH benchmark, I'd have to remove it there too: > https://github.com/openjdk/jdk/pull/24278/files#diff-93288fabe20d76b9df3fb5601e4d8600a46f438fe4b9c4ef92d702fdffa1c8c9R225-R230 It is not about diagnostic vs develop. It is about to keep it or not. It is fine to have this during development of these changes to prove that we need to optimize slow loops too. But now we know that we need to do the optimization. The only reason is to keep it is for debugging some future failures. You can keep it if you think it is very useful for debugging. My main concern is we have too many flags we never use and we are adding more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2282607286 From kvn at openjdk.org Mon Aug 18 14:45:28 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 18 Aug 2025 14:45:28 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Mon, 18 Aug 2025 06:17:52 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/c2_globals.hpp line 370: >> >>> 368: \ >>> 369: product(bool, UseAutoVectorizationSpeculativeAliasingChecks, true, DIAGNOSTIC, \ >>> 370: "Use Multiversioning or Predicate to add aliasing runtime checks") \ >> >> This flag description implies that it should depend on `LoopMultiversioning` and `UseAutoVectorizationPredicate` flags settings but I did not find such checks. > > I made the description more precise. The idea is that you can disable the speculative checks with `UseAutoVectorizationSpeculativeAliasingChecks`. If you have the speculative checks enabled, you still need to enable multiversioning and/or the auto vectorization predicate - otherwise that also disables the speculative checks. I ment, I don't see code for checking flags consistency for flags specified on command line. Consider next combination: % java -XX:+UseAutoVectorizationSpeculativeAliasingChecks -XX:-LoopMultiversioning -XX:-UseAutoVectorizationPredicate Test What VM will do? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2282619724 From kvn at openjdk.org Mon Aug 18 14:49:29 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 18 Aug 2025 14:49:29 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Mon, 18 Aug 2025 06:20:57 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/c2_globals.hpp line 367: >> >>> 365: \ >>> 366: product(bool, UseAutoVectorizationPredicate, true, DIAGNOSTIC, \ >>> 367: "Use AutoVectorization predicate (for speculative compilation)") \ >> >> I do not see benchmarks results with this flag off. > > Would that be helpful to you? How? > What would you expect to see here @vnkozlov ? > This is really an optimization that reduces the code-size. > Or do you just want to sanity-check that the peek performance would be identical if we use multiversioning rather than the predicate approach? You did benchmarking for `LoopMultiversioningOptimizeSlowLoop`. > use multiversioning rather than the predicate approach This one. Does alias analysis runtime checks requires both, multiversion and predicate, or can work only with one? If both enabled, which one you select for alias analisys? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2282630952 From kvn at openjdk.org Mon Aug 18 14:53:23 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 18 Aug 2025 14:53:23 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v10] In-Reply-To: <8oydcWWCxrLGTk74NqbUS5X97E6g-ZkU1El70fhClf4=.92d3f267-3e86-45b3-94b4-4020d05d5c7c@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <8oydcWWCxrLGTk74NqbUS5X97E6g-ZkU1El70fhClf4=.92d3f267-3e86-45b3-94b4-4020d05d5c7c@github.com> Message-ID: On Mon, 18 Aug 2025 07:44:53 GMT, Emanuel Peter wrote: > Do you think it is worth it to benchmark now, or should be just rely on @robcasloz 's occasional benchmarking and address the issues if they come up? I am fine with using Roberto's benchmarking later. Just keep eye on it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3197256741 From hgreule at openjdk.org Mon Aug 18 15:57:56 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 18 Aug 2025 15:57:56 GMT Subject: RFR: 8364407: [REDO] Consolidate Identity of self-inverse operations Message-ID: The previous approach was flawed for `short` and `char` as these are int-subtypes and truncate the result (see the backout issue https://bugs.openjdk.org/browse/JDK-8364409 for a reproducer). This change now first ensures that the input type is small enough so no truncation gets lost when dropping the operations. The previous implementation also used an `InvolutionNode` superclass with one `Identity(...)` implementation, but there were some reservations whether this is the right way to go. As we now have a `ReverseBytesNode`, there is also less benefit in having the supertype, as this covers 4 in 1 already. I also added test cases on top of the original ones that ensure the nodes stay when we can't prove the input type is small enough. ------------- Commit messages: - redo with type check - test Changes: https://git.openjdk.org/jdk/pull/26823/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26823&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8364407 Stats: 271 lines in 4 files changed: 263 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/26823.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26823/head:pull/26823 PR: https://git.openjdk.org/jdk/pull/26823 From hgreule at openjdk.org Mon Aug 18 15:57:56 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 18 Aug 2025 15:57:56 GMT Subject: RFR: 8364407: [REDO] Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 15:49:48 GMT, Hannes Greule wrote: > The previous approach was flawed for `short` and `char` as these are int-subtypes and truncate the result (see the backout issue https://bugs.openjdk.org/browse/JDK-8364409 for a reproducer). > > This change now first ensures that the input type is small enough so no truncation gets lost when dropping the operations. > > The previous implementation also used an `InvolutionNode` superclass with one `Identity(...)` implementation, but there were some reservations whether this is the right way to go. As we now have a `ReverseBytesNode`, there is also less benefit in having the supertype, as this covers 4 in 1 already. > > I also added test cases on top of the original ones that ensure the nodes stay when we can't prove the input type is small enough. test/hotspot/jtreg/compiler/c2/gvn/InvolutionIdentityTests.java line 211: > 209: > 210: @Test > 211: @IR(counts = {IRNode.REVERSE_BYTES_S, "2"}) I'm not sure if this is fine. The intrinsics might not apply to all platforms, in which case this would fail I think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26823#discussion_r2282799804 From mhaessig at openjdk.org Mon Aug 18 16:21:15 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 18 Aug 2025 16:21:15 GMT Subject: RFR: 8364407: [REDO] Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 15:50:37 GMT, Hannes Greule wrote: >> The previous approach was flawed for `short` and `char` as these are int-subtypes and truncate the result (see the backout issue https://bugs.openjdk.org/browse/JDK-8364409 for a reproducer). >> >> This change now first ensures that the input type is small enough so no truncation gets lost when dropping the operations. >> >> The previous implementation also used an `InvolutionNode` superclass with one `Identity(...)` implementation, but there were some reservations whether this is the right way to go. As we now have a `ReverseBytesNode`, there is also less benefit in having the supertype, as this covers 4 in 1 already. >> >> I also added test cases on top of the original ones that ensure the nodes stay when we can't prove the input type is small enough. > > test/hotspot/jtreg/compiler/c2/gvn/InvolutionIdentityTests.java line 211: > >> 209: >> 210: @Test >> 211: @IR(counts = {IRNode.REVERSE_BYTES_S, "2"}) > > I'm not sure if this is fine. The intrinsics might not apply to all platforms, in which case this would fail I think? The only case I can immediately think of is riscv without `-XX:+UseZbb`. But you can easily disable the test for that platform. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26823#discussion_r2282865605 From jkarthikeyan at openjdk.org Mon Aug 18 16:43:21 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 18 Aug 2025 16:43:21 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII Message-ID: Hi all, This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! ------------- Commit messages: - Fix truncation assert for constraint casts Changes: https://git.openjdk.org/jdk/pull/26827/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365570 Stats: 28 lines in 2 files changed: 26 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26827/head:pull/26827 PR: https://git.openjdk.org/jdk/pull/26827 From mhaessig at openjdk.org Mon Aug 18 16:56:14 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 18 Aug 2025 16:56:14 GMT Subject: RFR: 8364407: [REDO] Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 15:49:48 GMT, Hannes Greule wrote: > The previous approach was flawed for `short` and `char` as these are int-subtypes and truncate the result (see the backout issue https://bugs.openjdk.org/browse/JDK-8364409 for a reproducer). > > This change now first ensures that the input type is small enough so no truncation gets lost when dropping the operations. > > The previous implementation also used an `InvolutionNode` superclass with one `Identity(...)` implementation, but there were some reservations whether this is the right way to go. As we now have a `ReverseBytesNode`, there is also less benefit in having the supertype, as this covers 4 in 1 already. > > I also added test cases on top of the original ones that ensure the nodes stay when we can't prove the input type is small enough. Thank you for noticing the bug and improving your original PR, @SirYwell! I like your new approach without a superclass and your extensive testing of all edge cases and random values. Nice work! However, I do have some questions below. src/hotspot/share/opto/subnode.cpp line 2080: > 2078: if (type == nullptr || involution->bottom_type()->is_int()->contains(type)) { > 2079: return involution->in(1)->in(1); > 2080: } Instead of skipping the optimization, could you "clean" `involution->in(1)->in(1)` using `mask_int_value()`? That would follow the semantics of [JVMS?6.5 `ireturn`](https://docs.oracle.com/javase/specs/jvms/se24/html/jvms-6.html#jvms-6.5.ireturn). test/hotspot/jtreg/compiler/c2/gvn/InvolutionIdentityTests.java line 1: > 1: /* Why are you not testing involution on `NegL/I` nodes? Can this not be optimized? test/hotspot/jtreg/compiler/c2/gvn/InvolutionIdentityTests.java line 40: > 38: * @bug 8350988 8364407 > 39: * @summary Test that Identity simplifications of Involution nodes are being performed as expected. > 40: * @library /test/lib / Suggestion: * @summary Test that Identity simplifications of Involution nodes are being performed as expected. * @key randomness * @library /test/lib / Since you are using random inputs. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26823#pullrequestreview-3129106380 PR Review Comment: https://git.openjdk.org/jdk/pull/26823#discussion_r2282919600 PR Review Comment: https://git.openjdk.org/jdk/pull/26823#discussion_r2282938964 PR Review Comment: https://git.openjdk.org/jdk/pull/26823#discussion_r2282897963 From hgreule at openjdk.org Mon Aug 18 17:04:13 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 18 Aug 2025 17:04:13 GMT Subject: RFR: 8364407: [REDO] Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 16:40:11 GMT, Manuel H?ssig wrote: >> The previous approach was flawed for `short` and `char` as these are int-subtypes and truncate the result (see the backout issue https://bugs.openjdk.org/browse/JDK-8364409 for a reproducer). >> >> This change now first ensures that the input type is small enough so no truncation gets lost when dropping the operations. >> >> The previous implementation also used an `InvolutionNode` superclass with one `Identity(...)` implementation, but there were some reservations whether this is the right way to go. As we now have a `ReverseBytesNode`, there is also less benefit in having the supertype, as this covers 4 in 1 already. >> >> I also added test cases on top of the original ones that ensure the nodes stay when we can't prove the input type is small enough. > > src/hotspot/share/opto/subnode.cpp line 2080: > >> 2078: if (type == nullptr || involution->bottom_type()->is_int()->contains(type)) { >> 2079: return involution->in(1)->in(1); >> 2080: } > > Instead of skipping the optimization, could you "clean" `involution->in(1)->in(1)` using `mask_int_value()`? That would follow the semantics of [JVMS?6.5 `ireturn`](https://docs.oracle.com/javase/specs/jvms/se24/html/jvms-6.html#jvms-6.5.ireturn). I think this would need to happen in `Ideal()` then instead. Doing that would work too, but it further complicates things with little benefits imo. > test/hotspot/jtreg/compiler/c2/gvn/InvolutionIdentityTests.java line 1: > >> 1: /* > > Why are you not testing involution on `NegL/I` nodes? Can this not be optimized? `NegL/I` aren't used currently (see e.g. https://bugs.openjdk.org/browse/JDK-8262346). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26823#discussion_r2282958185 PR Review Comment: https://git.openjdk.org/jdk/pull/26823#discussion_r2282962673 From jkarthikeyan at openjdk.org Mon Aug 18 17:52:18 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 18 Aug 2025 17:52:18 GMT Subject: RFR: 8364407: [REDO] Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 15:49:48 GMT, Hannes Greule wrote: > The previous approach was flawed for `short` and `char` as these are int-subtypes and truncate the result (see the backout issue https://bugs.openjdk.org/browse/JDK-8364409 for a reproducer). > > This change now first ensures that the input type is small enough so no truncation gets lost when dropping the operations. > > The previous implementation also used an `InvolutionNode` superclass with one `Identity(...)` implementation, but there were some reservations whether this is the right way to go. As we now have a `ReverseBytesNode`, there is also less benefit in having the supertype, as this covers 4 in 1 already. > > I also added test cases on top of the original ones that ensure the nodes stay when we can't prove the input type is small enough. This looks nice! I've just left some style comments. src/hotspot/share/opto/subnode.cpp line 2075: > 2073: if (involution->in(1)->Opcode() == involution->Opcode()) { > 2074: Node* original = involution->in(1)->in(1); > 2075: const TypeInt *type = phase->type(original)->isa_int(); Suggestion: const TypeInt* type = phase->type(original)->isa_int(); src/hotspot/share/opto/subnode.cpp line 2076: > 2074: Node* original = involution->in(1)->in(1); > 2075: const TypeInt *type = phase->type(original)->isa_int(); > 2076: // Operations on sub-int types might not be "real" involutions for values outside their type range. I think it would be helpful to state an example of the disallowed case in the comment, maybe something like: Suggestion: // Operations on sub-int types might not be "real" involutions for values outside their type range, for example a ReverseBytesS node with an input larger than short. test/hotspot/jtreg/compiler/c2/gvn/InvolutionIdentityTests.java line 132: > 130: assertResultD(nand); > 131: > 132: } There's an extra whitespace: Suggestion: } ------------- PR Review: https://git.openjdk.org/jdk/pull/26823#pullrequestreview-3129186828 PR Review Comment: https://git.openjdk.org/jdk/pull/26823#discussion_r2282955571 PR Review Comment: https://git.openjdk.org/jdk/pull/26823#discussion_r2283065869 PR Review Comment: https://git.openjdk.org/jdk/pull/26823#discussion_r2282982876 From mgronlun at openjdk.org Mon Aug 18 18:12:13 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 18 Aug 2025 18:12:13 GMT Subject: RFR: 8365071: ARM32: JFR intrinsic jvm_commit triggers C2 regalloc assert In-Reply-To: <6MHwDW0E9bOzpj5B3pzlNmOCRPtFtnrk55NmTTxbhLM=.f0026c26-2c80-4766-8984-da9f34a31c8d@github.com> References: <6MHwDW0E9bOzpj5B3pzlNmOCRPtFtnrk55NmTTxbhLM=.f0026c26-2c80-4766-8984-da9f34a31c8d@github.com> Message-ID: On Fri, 8 Aug 2025 01:54:46 GMT, Boris Ulasevich wrote: > On 32-bit ARM, the jvm_commit JFR intrinsic builder feeds null (RegP) into a TypeLong Phi, causing mixed long/pointer register sizing and triggering the C2 register allocator assert(_num_regs == reg || !_num_regs). > > The fix is trivial: use an appropriate ConL constant instead. This has no effect on 64-bit systems (the generated assembly is identical) but resolves a JFR issue on 32-bit systems. Looks good. ------------- Marked as reviewed by mgronlun (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26684#pullrequestreview-3129401730 From shade at openjdk.org Mon Aug 18 18:49:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 18 Aug 2025 18:49:28 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v23] In-Reply-To: References: Message-ID: On Thu, 10 Jul 2025 13:02:07 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with six additional commits since the last revision: > > - Docs touchup > - Use enum class > - Further simplify the API > - Tune up for release builds > - Move release() to destructor > - Deal with things without spinlocks Currently distracted by higher-priority stuff. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-3183715140 From dlong at openjdk.org Mon Aug 18 21:43:47 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 18 Aug 2025 21:43:47 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v8] In-Reply-To: References: <6-poNTHw7LVDOcv91ZprJQFTb0nAJbAtxxMwp8vtPTg=.0a80771c-c23d-4f99-ab2e-c6392798d328@github.com> Message-ID: On Mon, 18 Aug 2025 07:57:43 GMT, Manuel H?ssig wrote: >> I'm not entirely sure but I guess it's fine since it's in the same thread. > > Maybe @dean-long can shed some light on this? I'm not an expert on resource areas, but using them in a signal handler seems questionable to me. See also JDK-8349578. I don't even think malloc allocations are safe in a signal handler. But since we are crashing, it probably doesn't matter. In this case, I would either add the ResourceMark (too avoid a "missing ResourceMark" assert), or use an alternative method for printing the name and signature. Note that the hs_err log file should contain the name of the method being compiled, so having it here in this assert is useful but not critical. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2283559577 From qxing at openjdk.org Tue Aug 19 01:34:03 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 19 Aug 2025 01:34:03 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v10] In-Reply-To: References: Message-ID: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: Remove redundant `@require` in IR test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25928/files - new: https://git.openjdk.org/jdk/pull/25928/files/b4b9b643..f1c0b45a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From qxing at openjdk.org Tue Aug 19 01:34:03 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 19 Aug 2025 01:34:03 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v9] In-Reply-To: <4jse1CDroshO-rXRvZcTqrcR9yRFc1pEOG3buxHLbZ0=.22c2bb7e-7524-4b1f-8f74-2b22edee1639@github.com> References: <4jse1CDroshO-rXRvZcTqrcR9yRFc1pEOG3buxHLbZ0=.22c2bb7e-7524-4b1f-8f74-2b22edee1639@github.com> Message-ID: On Mon, 18 Aug 2025 14:12:19 GMT, Emanuel Peter wrote: >> Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add microbench >> - Add missing test method declarations > > test/hotspot/jtreg/compiler/c2/gvn/TestCountBitsRange.java line 34: > >> 32: * @summary Tests that count bits nodes are handled correctly. >> 33: * @library /test/lib / >> 34: * @requires vm.compiler2.enabled > > Is this restriction necessary? IR rules are only run if we have C2 available in debug anyway. Other modes could still profit from correctness checks. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2283827354 From fyang at openjdk.org Tue Aug 19 03:33:41 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 19 Aug 2025 03:33:41 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v28] In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 08:37:06 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > - minor updates requested by reviewer Thanks for the update. Latest version LGTM. Please get approval from @robehn ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17413#pullrequestreview-3130552311 From bulasevich at openjdk.org Tue Aug 19 04:43:48 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 19 Aug 2025 04:43:48 GMT Subject: RFR: 8365071: ARM32: JFR intrinsic jvm_commit triggers C2 regalloc assert In-Reply-To: <6MHwDW0E9bOzpj5B3pzlNmOCRPtFtnrk55NmTTxbhLM=.f0026c26-2c80-4766-8984-da9f34a31c8d@github.com> References: <6MHwDW0E9bOzpj5B3pzlNmOCRPtFtnrk55NmTTxbhLM=.f0026c26-2c80-4766-8984-da9f34a31c8d@github.com> Message-ID: On Fri, 8 Aug 2025 01:54:46 GMT, Boris Ulasevich wrote: > On 32-bit ARM, the jvm_commit JFR intrinsic builder feeds null (RegP) into a TypeLong Phi, causing mixed long/pointer register sizing and triggering the C2 register allocator assert(_num_regs == reg || !_num_regs). > > The fix is trivial: use an appropriate ConL constant instead. This has no effect on 64-bit systems (the generated assembly is identical) but resolves a JFR issue on 32-bit systems. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26684#issuecomment-3199179245 From bulasevich at openjdk.org Tue Aug 19 04:43:49 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 19 Aug 2025 04:43:49 GMT Subject: Integrated: 8365071: ARM32: JFR intrinsic jvm_commit triggers C2 regalloc assert In-Reply-To: <6MHwDW0E9bOzpj5B3pzlNmOCRPtFtnrk55NmTTxbhLM=.f0026c26-2c80-4766-8984-da9f34a31c8d@github.com> References: <6MHwDW0E9bOzpj5B3pzlNmOCRPtFtnrk55NmTTxbhLM=.f0026c26-2c80-4766-8984-da9f34a31c8d@github.com> Message-ID: On Fri, 8 Aug 2025 01:54:46 GMT, Boris Ulasevich wrote: > On 32-bit ARM, the jvm_commit JFR intrinsic builder feeds null (RegP) into a TypeLong Phi, causing mixed long/pointer register sizing and triggering the C2 register allocator assert(_num_regs == reg || !_num_regs). > > The fix is trivial: use an appropriate ConL constant instead. This has no effect on 64-bit systems (the generated assembly is identical) but resolves a JFR issue on 32-bit systems. This pull request has now been integrated. Changeset: f2f7a490 Author: Boris Ulasevich URL: https://git.openjdk.org/jdk/commit/f2f7a490c091734ae1aa6cd402a117acbc1c699e Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8365071: ARM32: JFR intrinsic jvm_commit triggers C2 regalloc assert Reviewed-by: mgronlun ------------- PR: https://git.openjdk.org/jdk/pull/26684 From epeter at openjdk.org Tue Aug 19 05:53:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 05:53:51 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> <_Kw3K2gEmjUgLy5pYLnMsKH2N-cb-cKfc2ip412MACU=.e354810a-6fa6-4f6b-8470-984040bf712b@github.com> Message-ID: On Mon, 18 Aug 2025 14:38:18 GMT, Vladimir Kozlov wrote: >> It is now used in the JMH benchmark, I'd have to remove it there too: >> https://github.com/openjdk/jdk/pull/24278/files#diff-93288fabe20d76b9df3fb5601e4d8600a46f438fe4b9c4ef92d702fdffa1c8c9R225-R230 > > It is not about diagnostic vs develop. It is about to keep it or not. > > It is fine to have this during development of these changes to prove that we need to optimize slow loops too. > But now we know that we need to do the optimization. The only reason is to keep it is for debugging some future failures. You can keep it if you think it is very useful for debugging. > > My main concern is we have too many flags we never use and we are adding more. It really is only for benchmarking. Probably not super useful for debugging. I'll remove it with all its uses. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2284140917 From epeter at openjdk.org Tue Aug 19 06:03:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 06:03:47 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Mon, 18 Aug 2025 14:42:28 GMT, Vladimir Kozlov wrote: >> I made the description more precise. The idea is that you can disable the speculative checks with `UseAutoVectorizationSpeculativeAliasingChecks`. If you have the speculative checks enabled, you still need to enable multiversioning and/or the auto vectorization predicate - otherwise that also disables the speculative checks. > > I ment, I don't see code for checking flags consistency for flags specified on command line. Consider next combination: > > % java -XX:+UseAutoVectorizationSpeculativeAliasingChecks -XX:-LoopMultiversioning -XX:-UseAutoVectorizationPredicate Test > > What VM will do? It would like to add speculative checks, but cannot because neither multiversioning nor predicate is available. Currently, there is no error, nor is there any logic that changes the value of the flag. Would you like that to be an error? The downside is that I would have to add special logic in some tests to avoid such errors/crashes, where I now randomly enable and disable these tests. Or would you like me to check the values of the flags, and then possibly disable `UseAutoVectorizationSpeculativeAliasingChecks` automatically in the VM if neither multiversioning nor the predicate are available? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2284155578 From epeter at openjdk.org Tue Aug 19 06:09:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 06:09:47 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Mon, 18 Aug 2025 14:47:00 GMT, Vladimir Kozlov wrote: >> Would that be helpful to you? How? >> What would you expect to see here @vnkozlov ? >> This is really an optimization that reduces the code-size. >> Or do you just want to sanity-check that the peek performance would be identical if we use multiversioning rather than the predicate approach? > > You did benchmarking for `LoopMultiversioningOptimizeSlowLoop`. > >> use multiversioning rather than the predicate approach > > This one. > > Does alias analysis runtime checks requires both, multiversion and predicate, or can work only with one? > If both enabled, which one you select for alias analisys? I described it at the top of the PR: > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > > - Use the auto-vectorization predicate when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use multiversioning, i.e. we have a fast_loop where there is no aliasing, and hence vectorization. And a slow_loop if the check fails, with no vectorization. So if only one of them is available, we expect vectorization - at least as long as the check never fails. If we only have the predicate and not multiversioning, and the predicate fails, then we will never get a slow-loop. But sure, I can run some sanity benchmarking with only multiversioning and predicate disabled. I expect the peek performance to be identical, but compilation time will be slightly higher because we also always compile the slow-loop even if not needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2284161476 From epeter at openjdk.org Tue Aug 19 06:09:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 06:09:48 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Tue, 19 Aug 2025 06:01:02 GMT, Emanuel Peter wrote: >> I ment, I don't see code for checking flags consistency for flags specified on command line. Consider next combination: >> >> % java -XX:+UseAutoVectorizationSpeculativeAliasingChecks -XX:-LoopMultiversioning -XX:-UseAutoVectorizationPredicate Test >> >> What VM will do? > > It would like to add speculative checks, but cannot because neither multiversioning nor predicate is available. Currently, there is no error, nor is there any logic that changes the value of the flag. > > Would you like that to be an error? > The downside is that I would have to add special logic in some tests to avoid such errors/crashes, where I now randomly enable and disable these tests. > > Or would you like me to check the values of the flags, and then possibly disable `UseAutoVectorizationSpeculativeAliasingChecks` automatically in the VM if neither multiversioning nor the predicate are available? If I'm doing something, then probably automatically disable `UseAutoVectorizationSpeculativeAliasingChecks`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2284163924 From wenanjian at openjdk.org Tue Aug 19 06:34:41 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Tue, 19 Aug 2025 06:34:41 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics Message-ID: Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. ------------- Commit messages: - add Flags and fix the stubid name - RISC-V: implement AES-CTR mode intrinsics Changes: https://git.openjdk.org/jdk/pull/25281/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365732 Stats: 250 lines in 3 files changed: 245 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From wenanjian at openjdk.org Tue Aug 19 06:34:41 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Tue, 19 Aug 2025 06:34:41 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics In-Reply-To: References: Message-ID: On Sat, 17 May 2025 03:13:46 GMT, Anjian Wen wrote: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. still working on it. Pass the aes-ctr test in jtreg:test/hotspot/jtreg/compiler/codegen/aes Test in qemu-sys-riscv64, with extension v=true,zvkn=true,zvkned=true,zvknc=true,zvkng=true ------------- PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3067708259 PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3153284448 From aph at openjdk.org Tue Aug 19 06:34:42 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 19 Aug 2025 06:34:42 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics In-Reply-To: References: Message-ID: On Sat, 17 May 2025 03:13:46 GMT, Anjian Wen wrote: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2745: > 2743: __ vsetivli(x0, 4, Assembler::e32, Assembler::m1); > 2744: __ vrev8_v(v31, v31, Assembler::VectorMask::v0_t); // convert big-endien to little-endian > 2745: __ vadd_vi(v31, v31, 1, Assembler::VectorMask::v0_t); Are you sure this is correct? See `com.sun.crypto.provider.CounterMode::increment`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2206938722 From wenanjian at openjdk.org Tue Aug 19 06:34:43 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Tue, 19 Aug 2025 06:34:43 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics In-Reply-To: References: Message-ID: On Tue, 15 Jul 2025 09:13:16 GMT, Andrew Haley wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2745: > >> 2743: __ vsetivli(x0, 4, Assembler::e32, Assembler::m1); >> 2744: __ vrev8_v(v31, v31, Assembler::VectorMask::v0_t); // convert big-endien to little-endian >> 2745: __ vadd_vi(v31, v31, 1, Assembler::VectorMask::v0_t); > > Are you sure this is correct? See `com.sun.crypto.provider.CounterMode::increment`. Thanks for the review. I'm still developing it. Regarding the growth of the counter array, it should use 8 bytes to store the count. I use 4 Byte here according to OpenSSL aes-ctr code, I will try to fix it later https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-riscv64-zvkb-zvkned.pl#L242 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2230736352 From mhaessig at openjdk.org Tue Aug 19 06:37:44 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 19 Aug 2025 06:37:44 GMT Subject: RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression In-Reply-To: <2BCEe5coWSwvmmoUBLWZlzBs81azC4xeekoxZgLv_7I=.61ff0c53-5671-471e-96fb-85875116b5ac@github.com> References: <2BCEe5coWSwvmmoUBLWZlzBs81azC4xeekoxZgLv_7I=.61ff0c53-5671-471e-96fb-85875116b5ac@github.com> Message-ID: On Wed, 23 Jul 2025 12:34:37 GMT, Quan Anh Mai wrote: >> A loop of the form >> >> MemorySegment ms = {}; >> for (long i = 0; i < ms.byteSize() / 8L; i++) { >> // vectorizable work >> } >> >> does not vectorize, whereas >> >> MemorySegment ms = {}; >> long size = ms.byteSize(); >> for (long i = 0; i < size / 8L; i++) { >> // vectorizable work >> } >> >> vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: >> >> https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 >> >> Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. >> >> So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. >> >> @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem >> >> ## Change Description >> >> Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. >> >>
Explored Alternatives >> 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. >> 2. I... > > From the principle point of view, splitting a node through the loop `Phi` is only profitable if the profit is in the loop backedge. From the practical point of view, there are some issues when `split_through_phi` is applied recklessly such as [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096). I believe taking loop head into consideration when splitting through `Phi`s can solve these issues. As a result, I think while you are at this issue, it is worth investigating this approach. I ran another round of testing that passed. Thank you for your reviews @merykitty and @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26429#issuecomment-3199415604 From mhaessig at openjdk.org Tue Aug 19 06:41:01 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 19 Aug 2025 06:41:01 GMT Subject: Integrated: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression In-Reply-To: References: Message-ID: On Tue, 22 Jul 2025 15:05:29 GMT, Manuel H?ssig wrote: > A loop of the form > > MemorySegment ms = {}; > for (long i = 0; i < ms.byteSize() / 8L; i++) { > // vectorizable work > } > > does not vectorize, whereas > > MemorySegment ms = {}; > long size = ms.byteSize(); > for (long i = 0; i < size / 8L; i++) { > // vectorizable work > } > > vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*: > > https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176 > > Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization. > > So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization. > > @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem > > ## Change Description > > Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge. > >
Explored Alternatives > 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops. > 2. Insert a "`PHASEIDEALLOOP0`" with `LoopOptsNone` that only perfor... This pull request has now been integrated. Changeset: 626bea80 Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/626bea80abf1660757a12462ebc8313ef6d41f92 Stats: 256 lines in 7 files changed: 215 ins; 18 del; 23 mod 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression Co-authored-by: Quan Anh Mai Co-authored-by: Emanuel Peter Co-authored-by: Christian Hagedorn Co-authored-by: Tobias Hartmann Reviewed-by: epeter, qamai ------------- PR: https://git.openjdk.org/jdk/pull/26429 From epeter at openjdk.org Tue Aug 19 07:24:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 07:24:30 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v13] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > TODO work that arose during review process / recent merges with master: > > - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. > - Vladimir: consider disabling `UseAutoVectorizationSpeculativeAliasingChecks` if neither predicate nor multiversioning are available. > - Test failure with multiversioning: `/home/empeter/Documents/oracle/jtreg/bin/jtreg -va -s -jdk:/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64-debug/jdk -javaoptions:"-Djdk.test.lib.random.seed=-9045761078153722515" -J-Djavatest.maxOutputSize=10000000 /home/empeter/Documents/oracle/jdk-fork6/open/test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java` > > --------------- > > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: rm LoopMultiversioningOptimizeSlowLoop for Vladimir :) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/1fc7caa0..a5fdf97b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=11-12 Stats: 68 lines in 5 files changed: 0 ins; 67 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From amitkumar at openjdk.org Tue Aug 19 07:46:45 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 19 Aug 2025 07:46:45 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v8] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: On Mon, 18 Aug 2025 08:37:50 GMT, Saranya Natarajan wrote: > @offamitkumar and @TheRealMDoerr : Would it be possible to test this PR in s390 and PPC ? Hi @sarannat, thanks for the ping. I ran tier1 test on s390x and result came out clean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3199605352 From epeter at openjdk.org Tue Aug 19 08:12:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 08:12:30 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v14] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > TODO work that arose during review process / recent merges with master: > > - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. > - Vladimir: consider disabling `UseAutoVectorizationSpeculativeAliasingChecks` if neither predicate nor multiversioning are available. > - Test failure with multiversioning: `/home/empeter/Documents/oracle/jtreg/bin/jtreg -va -s -jdk:/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64-debug/jdk -javaoptions:"-Djdk.test.lib.random.seed=-9045761078153722515" -J-Djavatest.maxOutputSize=10000000 /home/empeter/Documents/oracle/jdk-fork6/open/test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java` > > --------------- > > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.j... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 207 commits: - manual merge with master - rm LoopMultiversioningOptimizeSlowLoop for Vladimir :) - addressing Vladimir's comments - more documentation for Christian - Update src/hotspot/share/opto/c2_globals.hpp Co-authored-by: Christian Hagedorn - improve predicates.hpp documentation - moved swapping up, suggested by Manuel - use Scenarios - apply suggestions from Manuel - Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java Co-authored-by: Manuel H?ssig - ... and 197 more: https://git.openjdk.org/jdk/compare/812434c4...67c6dd74 ------------- Changes: https://git.openjdk.org/jdk/pull/24278/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=13 Stats: 5373 lines in 23 files changed: 5123 ins; 19 del; 231 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From bkilambi at openjdk.org Tue Aug 19 08:21:54 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 19 Aug 2025 08:21:54 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 13:08:06 GMT, Andrew Haley wrote: > There's something that I still do not understand. > > In your tests I see this: > > ``` > // For vectorizable loops containing FP16 operations with an FP16 constant as one of the inputs, the IR > // node `(dst (Replicate con))` is generated to broadcast the constant into all lanes of an SVE register. > // On SVE-capable hardware with vector length > 16B, if the FP16 immediate is a signed value within the > // range [-128, 127] or a signed multiple of 256 in the range [-32768, 32512] for element widths of > // 16 bits or higher then the backend should generate the "replicateHF_imm_gt128b" machnode. > ``` > > Why is this restricted to special constants? You should be able to do this with any value by generating `mov rtemp, #n; dup zn.h, rtemp`. There's no need to generate `mov rtemp, #n; fmov stemp, rtemp; dup zn.h, stemp` This test does not test the `mov/fmov` instructions (only the `dup` instructions). The current code still generates `dup zn.h, #imm` for valid immediates and `dup zn.h, hn` for invalid immediates which is what is being tested in the JTREG testcase (as I only optimized`loadConH` and not the `replicateHF*` backend nodes) For the binary FP16 add loop that I have in the testcase, the compiler generates the `loadConH` node which does a `mov rtemp, #n; fmov stemp, rtemp;` (as we discussed earlier) which gets consumed by a few scalar iterations of the loop (which expect the input to be in an FPR which is why we need the `fmov`). When the vectorized code for the loop is emitted eventually, the `dup` instruction is generated (either `dup zn.h, #imm` or `dup zn.h, hn`) which is what is being tested in this JTREG test. I feel it's better to keep the `dup` instructions separate for valid and invalid immediates because there could be cases where the immediate is a valid one and `loadConH` is not required to be generated (maybe there are no scalar iterations and it is a pure vector loop) in which case it would make sense to emit `dup Zn.h, #imm` instead. Just to be clear, I am pasting the disassembly for the invalid case below - 0x0000e1e1a462c410: mov w8, #0x40b // #1035 0x0000e1e1a462c414: fmov s16, w8 .... 0x0000e1e1a462c44c: ldrsh w15, [x14, #16] 0x0000e1e1a462c450: mov v17.h[0], w15 0x0000e1e1a462c454: fadd h18, h17, h16 0x0000e1e1a462c458: smov x29, v18.h[0] .... 0x0000e1e1a462c4e4: mov z17.h, p7/m, h16 .... 0x0000e1e1a462c500: ld1h {z18.h}, p7/z, [x15] 0x0000e1e1a462c504: fadd z18.h, z18.h, z17.h 0x0000e1e1a462c508: add x13, x16, x13 0x0000e1e1a462c50c: add x15, x13, #0x10 0x0000e1e1a462c510: st1h {z18.h}, p7, [x15] 0x0000e1e1a462c514: add x15, x14, #0x30 0x0000e1e1a462c518: ld1h {z18.h}, p7/z, [x15] 0x0000e1e1a462c51c: fadd z18.h, z18.h, z17.h 0x0000e1e1a462c520: add x15, x13, #0x30 0x0000e1e1a462c524: st1h {z18.h}, p7, [x15] ..... For the valid case - 0x0000ff120d02bf28: orr w8, wzr, #0x400 0x0000ff120d02bf2c: fmov s17, w8 ... 0x0000ff120d02bf6c: ldrsh w14, [x13, #16] 0x0000ff120d02bf70: mov v16.h[0], w14 0x0000ff120d02bf74: fadd h18, h16, h17 0x0000ff120d02bf78: smov x13, v18.h[0] 0x0000ff120d02bf7c: add x10, x18, x10 0x0000ff120d02bf80: strh w13, [x10, #16] .... 0x0000ff120d02bfa4: mov z16.h, #1024 .... 0x0000ff120d02bff0: ld1h {z18.h}, p7/z, [x13] 0x0000ff120d02bff4: fadd z18.h, z18.h, z16.h 0x0000ff120d02bff8: add x11, x18, x11 0x0000ff120d02bffc: add x13, x11, #0x10 0x0000ff120d02c000: st1h {z18.h}, p7, [x13] 0x0000ff120d02c004: add x13, x12, #0x30 0x0000ff120d02c008: ld1h {z18.h}, p7/z, [x13] 0x0000ff120d02c00c: fadd z18.h, z18.h, z16.h 0x0000ff120d02c010: add x13, x11, #0x30 0x0000ff120d02c014: st1h {z18.h}, p7, [x13] .... ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3199646701 From duke at openjdk.org Tue Aug 19 09:30:44 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 19 Aug 2025 09:30:44 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v8] In-Reply-To: References: <5e1o1xtN0ZdQZGJi2aVmgCEApW625koeE9F53VhDi5E=.2390045d-844e-4800-8d4b-075a2a3a8793@github.com> Message-ID: On Wed, 4 Jun 2025 06:04:46 GMT, Robbin Ehn wrote: >> As you can expect I am trying to implement the following code with RVV: >> >> for (; i + (N-1) < cnt; i += N) { >> h = 31^^N * h >> + 31^^(N-1) * val[i + 0] >> + 31^^(N-2) * val[i + 1] >> ... >> + 31^^1 * val[i + (N-2)] >> + 31^^0 * val[i + (N-1)]; >> } >> for (; i < cnt; i++) { >> h = 31 * h + val[i]; >> } >> >> where `N` is a number of processing array elements in "chunk". >> IIUC, the main issue with your approach is "reverse" order of array elements versus preloaded `31^^X` coeffs WHEN the remaining number of elems is less than `N`, say `M=N-1`. >> >> h = 31^^M * h >> + 31^^(M-1) * val[i + 0] >> + 31^^(M-2) * val[i + 1] >> ... >> + 31^^1 * val[i + (M-2)] >> + 32^^0 * val[i + (M-1)]; >> >> or returning to our `N` for clarity >> >> h = 31^^(N-1) * h >> + 31^^(N-2) * val[i + 0] >> + 31^^(N-3) * val[i + 1] >> ... >> + 31^^1 * val[i + (N-3)] >> + 31^^0 * val[i + (N-2)]; >> >> Now we need to "slide down" preloaded multiplier coeffs in designated vector register by one (as `M=N-1`) to be in "sync" with `val[i + X]` (may be move them into temporary VR in the process), and moreover, DO this operation IFF the remaining `cnt` is less than `N` (==>an additional check on every iteration). That's probably acceptable only at tail phase as one-time operation but NOT inside of main loop... > > @ygaevsky @RealFYang how can we procced ? > Thanks for the update. Latest version LGTM. Please get approval from @robehn Sure. Many thanks for your thorough review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3199967971 From rsunderbabu at openjdk.org Tue Aug 19 10:09:42 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Tue, 19 Aug 2025 10:09:42 GMT Subject: RFR: 8286865: vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH/Test.java fails with Out of space in CodeCache Message-ID: MethodHandle invocations with Xcomp are filling up CodeCache quickly in the test, especially in machines with high number of processors. It is possible to measure code cache consumption per invocation, estimate overall consumption and bail out before CodeCache runs out of memory. But it is much simpler to exclude the test for Xcomp flag. Additional Change: MethodHandles.lookup was done unnecessarily invoked for all iterations. Replaced it with single invocation. PS: This issue is not seen in JDK 20 and above, possibly due to JDK-8290025, but the exclusion guards against vagaries of CodeCache management. ------------- Commit messages: - 8286865: vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH/Test.java fails with Out of space in CodeCache Changes: https://git.openjdk.org/jdk/pull/26840/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26840&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8286865 Stats: 24 lines in 1 file changed: 15 ins; 8 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26840.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26840/head:pull/26840 PR: https://git.openjdk.org/jdk/pull/26840 From lucy at openjdk.org Tue Aug 19 11:17:41 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 19 Aug 2025 11:17:41 GMT Subject: RFR: 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size [v2] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 05:40:14 GMT, Amit Kumar wrote: >> There isn't enough initial cache present which can let the interpreter mode run freely. So before even we reach to the compiler phase and try to bail out, in case there isn't enough space left for the stub compilation, JVM crashes. Idea is to increase the Initial cache size and make it enough to run interpreter mode at least. > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into testfix > - take the platform change out of loop > - fix Changes requested by lucy (Reviewer). test/hotspot/jtreg/compiler/startup/StartupOutput.java line 64: > 62: } > 63: > 64: int minInitialSize = 800 + (Platform.isS390x() ? 800 : 0); Do you unconditionally need the extra space or maybe only for fast debug builds? Maybe add a comment on which initialization code sizes (interpreter, stubs, ...) you observe. ------------- PR Review: https://git.openjdk.org/jdk/pull/25741#pullrequestreview-3131849845 PR Review Comment: https://git.openjdk.org/jdk/pull/25741#discussion_r2284925309 From mdoerr at openjdk.org Tue Aug 19 12:42:52 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 19 Aug 2025 12:42:52 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v8] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: <8gpz_pU_CectqZVtFlnWv0zMO4uSz5lT0KL4SUHdMFA=.3d7d897f-3fd7-4456-ba92-097811b50dee@github.com> On Wed, 13 Aug 2025 09:35:08 GMT, Saranya Natarajan wrote: >> **Issue** >> Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. >> >> **Analysis** >> On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. >> >> **Proposal** >> Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. >> >> **Issue in AArch64** >> Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? >> >> **Request to port maintainers** >> @dafedafe suggested that we keep the upper boun... > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > additions for linux-riscv64 PPC64 will need a fix, too. I'm looking into it. Thanks for the ping! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3200597749 From snatarajan at openjdk.org Tue Aug 19 12:42:51 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 19 Aug 2025 12:42:51 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v8] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: On Tue, 19 Aug 2025 07:43:55 GMT, Amit Kumar wrote: >> @offamitkumar and @TheRealMDoerr : Would it be possible to test this PR in s390 and PPC ? > >> @offamitkumar and @TheRealMDoerr : Would it be possible to test this PR in s390 and PPC ? > > Hi @sarannat, thanks for the ping. > I ran tier1 test on s390x and result came out clean. Thank you @offamitkumar ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3200596492 From duke at openjdk.org Tue Aug 19 12:56:35 2025 From: duke at openjdk.org (Samuel Chee) Date: Tue, 19 Aug 2025 12:56:35 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v2] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> <9wl7zcDsUD5im2gwdm-jtmLrgDl8oxxj3obx5VtDw90=.a7ca2423-d87f-4db2-9d0d-523a0d58c90f@github.com> Message-ID: On Wed, 6 Aug 2025 14:07:12 GMT, Andrew Haley wrote: >> My proposal is: >> >> 1. For `cmpxchg`, we add a trailingDMB option, and emit if `!useLSE && trailingDMB`, moving the dmbs from outside to inside the method. Have default value for trailingDMB be false so other call sites won't emit this dmb hence won't be affected. >> >> 2. In a separate ticket, `cmpxchgptr` and `cmpxchgw` already have DMBs inside their method definitions, so add extra trailingDMB parameter defaulted to true. And emit dmb if true. >> >> 3. In a separate ticket, apply same logic to `atomic_##NAME` to move DMB inside function and default trailingDMB to false to not affect other call sites. >> >> Does this sound good to you? > >> My proposal is: >> >> 1. For `cmpxchg`, we add a trailingDMB option, and emit if `!useLSE && trailingDMB`, moving the dmbs from outside to inside the method. Have default value for trailingDMB be false so other call sites won't emit this dmb hence won't be affected. > > I think it would be better to refactor things so that the intent is clear. better have `cmpxchg_barrier` and use that for C1. > >> 2. In a separate ticket, `cmpxchgptr` and `cmpxchgw` already have DMBs inside their method definitions, so add extra trailingDMB parameter defaulted to true. And emit dmb if true. > > Likewise. > >> 3. In a separate ticket, apply same logic to `atomic_##NAME` to move DMB inside function and default trailingDMB to false to not affect other call sites. > > Likewise. Have just addressed for this PR, should have one for cmpxchgptr up shortly in a seperate pr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2285160803 From duke at openjdk.org Tue Aug 19 12:56:33 2025 From: duke at openjdk.org (Samuel Chee) Date: Tue, 19 Aug 2025 12:56:33 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v4] In-Reply-To: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: > AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: > > ;; cmpxchg { > 0x0000e708d144cf60: mov x8, x2 > 0x0000e708d144cf64: casal x8, x3, [x0] > 0x0000e708d144cf68: cmp x8, x2 > ;; 0x1F1F1F1F1F1F1F1F > 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f > ;; } cmpxchg > 0x0000e708d144cf70: cset x8, ne // ne = any > 0x0000e708d144cf74: dmb ish > > > According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] > >> Atomically sets the value of a variable to the >> newValue with the memory semantics of setVolatile if >> the variable's current value, referred to as the witness >> value, == the expectedValue, as accessed with the memory >> semantics of getVolatile. > > > > Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. > > Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) > > This is also reflected by C2 not having a dmb for the same respective method. > > [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) > [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) Samuel Chee has updated the pull request incrementally with one additional commit since the last revision: Add cmpxchg_barrier helper Change-Id: I17acf999140f0c1decb256de8291361c568a4ff8 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26000/files - new: https://git.openjdk.org/jdk/pull/26000/files/8eb9096d..092c92e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26000&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26000&range=02-03 Stats: 33 lines in 3 files changed: 23 ins; 8 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26000/head:pull/26000 PR: https://git.openjdk.org/jdk/pull/26000 From epeter at openjdk.org Tue Aug 19 13:08:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 13:08:58 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v15] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > TODO work that arose during review process / recent merges with master: > > - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. > - Vladimir: consider disabling `UseAutoVectorizationSpeculativeAliasingChecks` if neither predicate nor multiversioning are available. > - Test failure with multiversioning: `/home/empeter/Documents/oracle/jtreg/bin/jtreg -va -s -jdk:/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64-debug/jdk -javaoptions:"-Djdk.test.lib.random.seed=-9045761078153722515" -J-Djavatest.maxOutputSize=10000000 /home/empeter/Documents/oracle/jdk-fork6/open/test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java` > > --------------- > > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix tests after master integration of JDK-8342692 and JDK-8356176 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/67c6dd74..4fb1bc11 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=13-14 Stats: 196 lines in 4 files changed: 124 ins; 68 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From mdoerr at openjdk.org Tue Aug 19 14:04:02 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 19 Aug 2025 14:04:02 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v8] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: On Wed, 13 Aug 2025 09:35:08 GMT, Saranya Natarajan wrote: >> **Issue** >> Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. >> >> **Analysis** >> On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. >> >> **Proposal** >> Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. >> >> **Issue in AArch64** >> Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? >> >> **Request to port maintainers** >> @dafedafe suggested that we keep the upper boun... > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > additions for linux-riscv64 Can you add this patch, please? diff --git a/src/hotspot/cpu/ppc/interp_masm_ppc.hpp b/src/hotspot/cpu/ppc/interp_masm_ppc.hpp index d3969427db3..ac3825d152f 100644 --- a/src/hotspot/cpu/ppc/interp_masm_ppc.hpp +++ b/src/hotspot/cpu/ppc/interp_masm_ppc.hpp @@ -228,7 +228,7 @@ class InterpreterMacroAssembler: public MacroAssembler { // Interpreter profiling operations void set_method_data_pointer_for_bcp(); - void test_method_data_pointer(Label& zero_continue); + void test_method_data_pointer(Label& zero_continue, bool may_be_far = false); void verify_method_data_pointer(); void set_mdp_data_at(int constant, Register value); diff --git a/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp b/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp index 29fb54250c2..7557709653a 100644 --- a/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp +++ b/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp @@ -1249,10 +1249,14 @@ void InterpreterMacroAssembler::set_method_data_pointer_for_bcp() { } // Test ImethodDataPtr. If it is null, continue at the specified label. -void InterpreterMacroAssembler::test_method_data_pointer(Label& zero_continue) { +void InterpreterMacroAssembler::test_method_data_pointer(Label& zero_continue, bool may_be_far) { assert(ProfileInterpreter, "must be profiling interpreter"); cmpdi(CR0, R28_mdx, 0); - beq(CR0, zero_continue); + if (may_be_far) { + bc_far_optimized(Assembler::bcondCRbiIs1, bi0(CR0, Assembler::equal), zero_continue); + } else { + beq(CR0, zero_continue); + } } void InterpreterMacroAssembler::verify_method_data_pointer() { @@ -1555,7 +1559,7 @@ void InterpreterMacroAssembler::profile_ret(TosState state, Register return_bci, uint row; // If no method data exists, go to profile_continue. - test_method_data_pointer(profile_continue); + test_method_data_pointer(profile_continue, true); // Update the total ret count. increment_mdp_data_at(in_bytes(CounterData::count_offset()), scratch1, scratch2 ); ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3200895865 From epeter at openjdk.org Tue Aug 19 14:05:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 14:05:45 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v10] In-Reply-To: References: Message-ID: On Tue, 19 Aug 2025 01:34:03 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant `@require` in IR test Looks like a good patch, thanks for the work and patience with the review - it's been a bit slow over summer with vacation/travel. src/hotspot/share/opto/countbitsnode.cpp line 47: > 45: if (x >> 30 == 0) { n += 2; x <<= 2; } > 46: n -= x >> 31; > 47: return TypeInt::make(n); Is there already a test that covers all the cases that constant fold here? Just to make sure we do not get regressions here. src/hotspot/share/opto/countbitsnode.cpp line 57: > 55: const TypeInt* ti = t->is_int(); > 56: return TypeInt::make(count_leading_zeros_int(~ti->_bits._zeros), > 57: count_leading_zeros_int(ti->_bits._ones), I think this is correct, but I would like to see a short comment why it is correct. test/hotspot/jtreg/compiler/c2/gvn/TestCountBitsRange.java line 164: > 162: return Long.numberOfTrailingZeros(l) / 8; > 163: } > 164: } Nice examples! Could you please add a short description to most of them, explaining what you are testing with each? It would help me as a reviewer to see if you cover enough cases. I'm also missing some cases where you have non-trivial input ranges. And then verification that the output range is correct. You could look at this example: https://github.com/openjdk/jdk/pull/25254/files#diff-0e3d89ac8cf0548b69d9bdb0859380bc31de0a772fa7ff211f446a4a5abd4197R220-R248 ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3132415938 PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2285354628 PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2285342030 PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2285373835 From epeter at openjdk.org Tue Aug 19 14:05:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 14:05:46 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: <2vGPKe7ESZqYjemMvDjFxb4QTk3VjybE0lk59Vqj1Ts=.e6a555a5-407b-4389-8db5-aa02a7de9960@github.com> Message-ID: On Tue, 24 Jun 2025 07:34:32 GMT, Qizheng Xing wrote: >> This is because our implementation does not accept 0 as an input. I suggest doing this at `count_leading_zeros`, it makes more sense and also aligns our behaviour with the well-known [`countr_zero`](https://en.cppreference.com/w/cpp/numeric/countr_zero.html) and [`countl_zero`](https://en.cppreference.com/w/cpp/numeric/countl_zero.html) > >> Can you explain why you need this? Why is `count_trailing_zeros` and `count_leading_zeros` not enough, when you cast at the use-site? > > @eme64 The explanation of @merykitty is right, the implementation of `count_leading_zeros` and `count_trailing_zeros` reject zero as the input. > > Perhaps we could open another PR to add zero support for these functions, since it's less relevant to this node type change and might require other changes to the code that calls them. In `src/hotspot/share/utilities/count_leading_zeros.hpp`, it says that 0 behavior is undefined. Ok... but why do we do that? Is that a performance optimization ? If yes, is it really worth it? If there is no good reason not to handle 0, we should just handle it. We have some tests in `test/hotspot/gtest/utilities/test_count_leading_zeros.cpp`. It would be interesting to quickly check if any use of these methods could ever encounter zero, and then hit the assert. I would not be surprised if we found a bug here. I think this would be a worth while cleanup task. I would prefer if we clean things up now, and don't just let more special handling code get integrated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2285322703 From epeter at openjdk.org Tue Aug 19 14:05:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 14:05:47 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v10] In-Reply-To: References: Message-ID: <8cq6Lhw9sc_Fd7adnL0t1F10UowOHDr8eEgZSD9MFUc=.d6b189a1-ac3e-4175-8e15-5e16691b6422@github.com> On Tue, 19 Aug 2025 13:50:28 GMT, Emanuel Peter wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove redundant `@require` in IR test > > src/hotspot/share/opto/countbitsnode.cpp line 57: > >> 55: const TypeInt* ti = t->is_int(); >> 56: return TypeInt::make(count_leading_zeros_int(~ti->_bits._zeros), >> 57: count_leading_zeros_int(ti->_bits._ones), > > I think this is correct, but I would like to see a short comment why it is correct. Same in other cases below ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2285345674 From aturbanov at openjdk.org Tue Aug 19 14:18:38 2025 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Tue, 19 Aug 2025 14:18:38 GMT Subject: RFR: 8286865: vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH/Test.java fails with Out of space in CodeCache In-Reply-To: References: Message-ID: On Tue, 19 Aug 2025 10:02:05 GMT, Ramkumar Sunderbabu wrote: > MethodHandle invocations with Xcomp are filling up CodeCache quickly in the test, especially in machines with high number of processors. > It is possible to measure code cache consumption per invocation, estimate overall consumption and bail out before CodeCache runs out of memory. > But it is much simpler to exclude the test for Xcomp flag. > > Additional Change: MethodHandles.lookup was done unnecessarily invoked for all iterations. Replaced it with single invocation. > > PS: This issue is not seen in JDK 20 and above, possibly due to JDK-8290025, but the exclusion guards against vagaries of CodeCache management. test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH/Test.java line 79: > 77: "calledFromNative", > 78: MT_calledFromNative); > 79: } catch(Exception ex) { Suggestion: } catch (Exception ex) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26840#discussion_r2285421999 From mhaessig at openjdk.org Tue Aug 19 14:36:05 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 19 Aug 2025 14:36:05 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v11] In-Reply-To: References: Message-ID: > This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. > > The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. > > Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 on all platforms > - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug > - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: - Print timeout properly - Use static buffer for method name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26023/files - new: https://git.openjdk.org/jdk/pull/26023/files/80ddb0ad..9926971f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=09-10 Stats: 7 lines in 2 files changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26023/head:pull/26023 PR: https://git.openjdk.org/jdk/pull/26023 From mhaessig at openjdk.org Tue Aug 19 14:36:05 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 19 Aug 2025 14:36:05 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v8] In-Reply-To: References: <6-poNTHw7LVDOcv91ZprJQFTb0nAJbAtxxMwp8vtPTg=.0a80771c-c23d-4f99-ab2e-c6392798d328@github.com> Message-ID: On Mon, 18 Aug 2025 21:41:11 GMT, Dean Long wrote: >> Maybe @dean-long can shed some light on this? > > I'm not an expert on resource areas, but using them in a signal handler seems questionable to me. See also JDK-8349578. I don't even think malloc allocations are safe in a signal handler. But since we are crashing, it probably doesn't matter. In this case, I would either add the ResourceMark (too avoid a "missing ResourceMark" assert), or use an alternative method for printing the name and signature. Note that the hs_err log file should contain the name of the method being compiled, so having it here in this assert is useful but not critical. I just realized that `name_and_sig_as_C_string()` can also take a buffer and size as argument and will use that with a static buffer instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2285474664 From epeter at openjdk.org Tue Aug 19 14:51:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 14:51:17 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v16] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <_LyK5DdYZZpj2eAefHfnd6zbKVLWHf13WePGFDvdlHs=.f6285625-0fd7-4cbc-9001-825797c3b998@github.com> > TODO work that arose during review process / recent merges with master: > > - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. > - Vladimir: consider disabling `UseAutoVectorizationSpeculativeAliasingChecks` if neither predicate nor multiversioning are available. > - Test failure with multiversioning: `/home/empeter/Documents/oracle/jtreg/bin/jtreg -va -s -jdk:/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64-debug/jdk -javaoptions:"-Djdk.test.lib.random.seed=-9045761078153722515" -J-Djavatest.maxOutputSize=10000000 /home/empeter/Documents/oracle/jdk-fork6/open/test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java` > - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. > > --------------- > > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advance... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: improve benchmark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/4fb1bc11..41e45bf3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=14-15 Stats: 67 lines in 1 file changed: 67 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Tue Aug 19 14:55:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 14:55:53 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v10] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <8oydcWWCxrLGTk74NqbUS5X97E6g-ZkU1El70fhClf4=.92d3f267-3e86-45b3-94b4-4020d05d5c7c@github.com> Message-ID: <2FOwq21WXrvbB5FVS8puNOc0_uRqCl0fJDuWJPnJIyQ=.387b9f91-fdca-40ef-b285-90021512b5a2@github.com> On Mon, 18 Aug 2025 14:50:20 GMT, Vladimir Kozlov wrote: >>> @eme64 did you measure how much C2 compilation time changed with these changes (all optimizations enabled)? >> >> I did not. I don't think it would take much extra time in almost all cases. The extra analysis is not that costly compared to unrolling that we do in all cases already. What might cost more: if we deopt because of the runtime check, and recompile with multiversioning. That could essencially double C2 compile time for those cases. >> >> Do you think it is worth it to benchmark now, or should be just rely on @robcasloz 's occasional benchmarking and address the issues if they come up? >> >> If you want me to do C2 time benchmarking: should I just show a few specific micro-benchmarks, or do you want to have statistics collected on larger benchmark suites? > >> Do you think it is worth it to benchmark now, or should be just rely on @robcasloz 's occasional benchmarking and address the issues if they come up? > > I am fine with using Roberto's benchmarking later. Just keep eye on it. @vnkozlov I ran some more benchmarks: image Columns: - `not_profitable` - `-XX:AutoVectorizationOverrideProfitability=0`. Serves as baseline scalar performance. Unrolling is the same as if we vectorized. - `no_sw` - `-XX:+UseSuperWord`. Can mess with unrolling factor, and thus gets worse performance. - `patch` - no flags. Overall best performance - except for `bench_copy_array_B_differentIndex_alias` and `bench_copy_array_I_differentIndex_alias` - need to investigate ? - `no_predicate` - `-XX:-UseAutoVectorizationPredicate`. Same performance as `patch`, we just always use multiversioning immediately. In a separate benchmark, I can show that this requires more C2 compile time and produces larger code - so less desirable. - `no_multiversioning` - `-XX:-LoopMultiversioning`: struggles with mixed cases. As soon as it encounters an aliasing case, the predicate leads to deopt, and then we recompile without predicate, and so do not vectorize any more - you get scalar performance. - `no_rt_check` - `-XX:-UseAutoVectorizationSpeculativeAliasingChecks`: behavior as before patch - no vectorization of runtime check required. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3201092650 From rsunderbabu at openjdk.org Tue Aug 19 15:02:55 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Tue, 19 Aug 2025 15:02:55 GMT Subject: RFR: 8286865: vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH/Test.java fails with Out of space in CodeCache [v2] In-Reply-To: References: Message-ID: > MethodHandle invocations with Xcomp are filling up CodeCache quickly in the test, especially in machines with high number of processors. > It is possible to measure code cache consumption per invocation, estimate overall consumption and bail out before CodeCache runs out of memory. > But it is much simpler to exclude the test for Xcomp flag. > > Additional Change: MethodHandles.lookup was done unnecessarily invoked for all iterations. Replaced it with single invocation. > > PS: This issue is not seen in JDK 20 and above, possibly due to JDK-8290025, but the exclusion guards against vagaries of CodeCache management. Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: addressed review comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26840/files - new: https://git.openjdk.org/jdk/pull/26840/files/b4d8af71..30cb217c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26840&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26840&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26840.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26840/head:pull/26840 PR: https://git.openjdk.org/jdk/pull/26840 From duke at openjdk.org Tue Aug 19 15:06:45 2025 From: duke at openjdk.org (Samuel Chee) Date: Tue, 19 Aug 2025 15:06:45 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v2] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> <9wl7zcDsUD5im2gwdm-jtmLrgDl8oxxj3obx5VtDw90=.a7ca2423-d87f-4db2-9d0d-523a0d58c90f@github.com> Message-ID: On Tue, 19 Aug 2025 12:52:39 GMT, Samuel Chee wrote: >>> My proposal is: >>> >>> 1. For `cmpxchg`, we add a trailingDMB option, and emit if `!useLSE && trailingDMB`, moving the dmbs from outside to inside the method. Have default value for trailingDMB be false so other call sites won't emit this dmb hence won't be affected. >> >> I think it would be better to refactor things so that the intent is clear. better have `cmpxchg_barrier` and use that for C1. >> >>> 2. In a separate ticket, `cmpxchgptr` and `cmpxchgw` already have DMBs inside their method definitions, so add extra trailingDMB parameter defaulted to true. And emit dmb if true. >> >> Likewise. >> >>> 3. In a separate ticket, apply same logic to `atomic_##NAME` to move DMB inside function and default trailingDMB to false to not affect other call sites. >> >> Likewise. > > Have just addressed for this PR, should have one for cmpxchgptr up shortly in a seperate pr. Related: https://github.com/openjdk/jdk/pull/26845 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2285560233 From kvn at openjdk.org Tue Aug 19 15:11:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 19 Aug 2025 15:11:53 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Tue, 19 Aug 2025 06:05:03 GMT, Emanuel Peter wrote: >> You did benchmarking for `LoopMultiversioningOptimizeSlowLoop`. >> >>> use multiversioning rather than the predicate approach >> >> This one. >> >> Does alias analysis runtime checks requires both, multiversion and predicate, or can work only with one? >> If both enabled, which one you select for alias analisys? > > I described it at the top of the PR: >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> >> - Use the auto-vectorization predicate when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use multiversioning, i.e. we have a fast_loop where there is no aliasing, and hence vectorization. And a slow_loop if the check fails, with no vectorization. > > So if only one of them is available, we expect vectorization - at least as long as the check never fails. If we only have the predicate and not multiversioning, and the predicate fails, then we will never get a slow-loop. > > But sure, I can run some sanity benchmarking with only multiversioning and predicate disabled. I expect the peek performance to be identical, but compilation time will be slightly higher because we also always compile the slow-loop even if not needed. I mean, it should be in code's comments. And, yes please run benchmarks with different configuration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2285573823 From kvn at openjdk.org Tue Aug 19 15:11:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 19 Aug 2025 15:11:54 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Tue, 19 Aug 2025 06:06:40 GMT, Emanuel Peter wrote: >> It would like to add speculative checks, but cannot because neither multiversioning nor predicate is available. Currently, there is no error, nor is there any logic that changes the value of the flag. >> >> Would you like that to be an error? >> The downside is that I would have to add special logic in some tests to avoid such errors/crashes, where I now randomly enable and disable these tests. >> >> Or would you like me to check the values of the flags, and then possibly disable `UseAutoVectorizationSpeculativeAliasingChecks` automatically in the VM if neither multiversioning nor the predicate are available? > > If I'm doing something, then probably automatically disable `UseAutoVectorizationSpeculativeAliasingChecks`. Yes, you can do that in `CompilerConfig::ergo_initialize()` as we do for other compiler's flags. That is what I am asking for. I don't think you need to do that in `check_args_consistency()` because flags don't conflict. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2285567851 From kvn at openjdk.org Tue Aug 19 16:05:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 19 Aug 2025 16:05:50 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v10] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <8oydcWWCxrLGTk74NqbUS5X97E6g-ZkU1El70fhClf4=.92d3f267-3e86-45b3-94b4-4020d05d5c7c@github.com> Message-ID: On Mon, 18 Aug 2025 14:50:20 GMT, Vladimir Kozlov wrote: >>> @eme64 did you measure how much C2 compilation time changed with these changes (all optimizations enabled)? >> >> I did not. I don't think it would take much extra time in almost all cases. The extra analysis is not that costly compared to unrolling that we do in all cases already. What might cost more: if we deopt because of the runtime check, and recompile with multiversioning. That could essencially double C2 compile time for those cases. >> >> Do you think it is worth it to benchmark now, or should be just rely on @robcasloz 's occasional benchmarking and address the issues if they come up? >> >> If you want me to do C2 time benchmarking: should I just show a few specific micro-benchmarks, or do you want to have statistics collected on larger benchmark suites? > >> Do you think it is worth it to benchmark now, or should be just rely on @robcasloz 's occasional benchmarking and address the issues if they come up? > > I am fine with using Roberto's benchmarking later. Just keep eye on it. > @vnkozlov I ran some more benchmarks: Thank you for running benchmarks. Which one you check first for aliasing code: multiversioning or predicates? >From this experiments I think the best sequence would be (when both predicates and multiversioning are enabled): - use predicates for aliasing (fast compilation, small code) - if it is deoptimized recompile with multiversioning Is this how it works now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3201340811 From dlong at openjdk.org Tue Aug 19 16:37:39 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 19 Aug 2025 16:37:39 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v11] In-Reply-To: References: Message-ID: On Tue, 19 Aug 2025 14:36:05 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. >> >> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. >> >> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. >> >> Testing: >> - [x] Github Actions >> - [x] tier1, tier2 on all platforms >> - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug >> - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - Print timeout properly > - Use static buffer for method name src/hotspot/share/utilities/globalDefinitions.hpp line 154: > 152: > 153: // Format pointers and padded integral values which change size between 32- and 64-bit. > 154: #define INTX_FORMAT "%" PRIdPTR Do we really need to add this back? This was removed by JDK-8346990. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2285785156 From dlong at openjdk.org Tue Aug 19 16:40:41 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 19 Aug 2025 16:40:41 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v11] In-Reply-To: References: Message-ID: On Tue, 19 Aug 2025 14:36:05 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. >> >> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. >> >> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. >> >> Testing: >> - [x] Github Actions >> - [x] tier1, tier2 on all platforms >> - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug >> - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - Print timeout properly > - Use static buffer for method name src/hotspot/os/linux/compilerThreadTimeout_linux.cpp line 47: > 45: char method_name_buf[SIZE]; > 46: task->method()->name_and_sig_as_C_string(method_name_buf, SIZE); > 47: assert(false, "compile task %d (%s) timed out after " INTX_FORMAT " ms", Can we use %zd here? INTX_FORMAT was removed in JDK-8346990. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2285791418 From epeter at openjdk.org Tue Aug 19 16:45:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 16:45:48 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Tue, 19 Aug 2025 15:09:26 GMT, Vladimir Kozlov wrote: >> I described it at the top of the PR: >>> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >>> >>> - Use the auto-vectorization predicate when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >>> - If the predicate is not available, we use multiversioning, i.e. we have a fast_loop where there is no aliasing, and hence vectorization. And a slow_loop if the check fails, with no vectorization. >> >> So if only one of them is available, we expect vectorization - at least as long as the check never fails. If we only have the predicate and not multiversioning, and the predicate fails, then we will never get a slow-loop. >> >> But sure, I can run some sanity benchmarking with only multiversioning and predicate disabled. I expect the peek performance to be identical, but compilation time will be slightly higher because we also always compile the slow-loop even if not needed. > > I mean, it should be in code's comments. And, yes please run benchmarks with different configuration. The patch already adds these comments: - `predicates.hpp` - https://github.com/openjdk/jdk/pull/24278/files#diff-d3883ecef2a7ed7fecf2f7b3b7d60c898b97d4199717552ecd52c3973e298a68R88-R102 - `VTransform::apply_speculative_aliasing_runtime_checks` uses pre-existing `add_speculative_check` Before the patch, we already have: - `add_speculative_check`: I think it reads quite clearly, but it does not have any good descriptions. It calls: - `create_new_if_for_predicate`: no mention about multiversioning... but predicates apply to non auto-vec uses, so it should probably not be placed there. - `create_new_if_for_multiversion`: does not mention mention much. I'll add a link to `maybe_multiversion_for_auto_vectorization_runtime_checks` where there is more documentation. - `PhaseIdealLoop::maybe_multiversion_for_auto_vectorization_runtime_checks` mentions that we only multiversion if there is no predicate. I'm adding some more documentation and cross-links. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2285801543 From epeter at openjdk.org Tue Aug 19 16:50:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 16:50:50 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: <_CK_FCIS50G0zdQJ23gx9Vl1EEsA3WVNR4Up95Oei9A=.d2c45700-64ff-4cd9-a0a9-7bcc2ebdd19c@github.com> On Tue, 19 Aug 2025 16:42:58 GMT, Emanuel Peter wrote: >> I mean, it should be in code's comments. And, yes please run benchmarks with different configuration. > > The patch already adds these comments: > > - `predicates.hpp` > - https://github.com/openjdk/jdk/pull/24278/files#diff-d3883ecef2a7ed7fecf2f7b3b7d60c898b97d4199717552ecd52c3973e298a68R88-R102 > - `VTransform::apply_speculative_aliasing_runtime_checks` uses pre-existing `add_speculative_check` > > Before the patch, we already have: > - `add_speculative_check`: I think it reads quite clearly, but it does not have any good descriptions. It calls: > - `create_new_if_for_predicate`: no mention about multiversioning... but predicates apply to non auto-vec uses, so it should probably not be placed there. > - `create_new_if_for_multiversion`: does not mention mention much. I'll add a link to `maybe_multiversion_for_auto_vectorization_runtime_checks` where there is more documentation. > - `PhaseIdealLoop::maybe_multiversion_for_auto_vectorization_runtime_checks` mentions that we only multiversion if there is no predicate. > > I'm adding some more documentation and cross-links. At some point, we need a more high-level documentation in `superword.hpp`. Currently, there is some documentation in `SuperWord::SLP_extract`, but that is not very easy to find, and I think there are also some inaccuracies there. But I'll look at that in a future RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2285810654 From epeter at openjdk.org Tue Aug 19 16:58:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Aug 2025 16:58:55 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v17] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > TODO work that arose during review process / recent merges with master: > > - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. > - Vladimir: consider disabling `UseAutoVectorizationSpeculativeAliasingChecks` if neither predicate nor multiversioning are available. > - Test failure with multiversioning: `/home/empeter/Documents/oracle/jtreg/bin/jtreg -va -s -jdk:/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64-debug/jdk -javaoptions:"-Djdk.test.lib.random.seed=-9045761078153722515" -J-Djavatest.maxOutputSize=10000000 /home/empeter/Documents/oracle/jdk-fork6/open/test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java` > - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. > > --------------- > > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advance... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more documentation for Vladimir ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/41e45bf3..f84ec341 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=15-16 Stats: 30 lines in 4 files changed: 30 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From mhaessig at openjdk.org Tue Aug 19 17:02:41 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 19 Aug 2025 17:02:41 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v11] In-Reply-To: References: Message-ID: On Tue, 19 Aug 2025 16:34:59 GMT, Dean Long wrote: >> Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: >> >> - Print timeout properly >> - Use static buffer for method name > > src/hotspot/share/utilities/globalDefinitions.hpp line 154: > >> 152: >> 153: // Format pointers and padded integral values which change size between 32- and 64-bit. >> 154: #define INTX_FORMAT "%" PRIdPTR > > Do we really need to add this back? This was removed by JDK-8346990. I was not aware of that. I will remove it and use `%zd`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2285834958 From mhaessig at openjdk.org Tue Aug 19 17:31:58 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 19 Aug 2025 17:31:58 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v12] In-Reply-To: References: Message-ID: > This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. > > The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. > > Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 on all platforms > - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug > - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Print with %zd ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26023/files - new: https://git.openjdk.org/jdk/pull/26023/files/9926971f..64738e25 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=10-11 Stats: 2 lines in 2 files changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26023/head:pull/26023 PR: https://git.openjdk.org/jdk/pull/26023 From mhaessig at openjdk.org Tue Aug 19 17:31:58 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 19 Aug 2025 17:31:58 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v11] In-Reply-To: References: Message-ID: On Tue, 19 Aug 2025 16:38:00 GMT, Dean Long wrote: >> Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: >> >> - Print timeout properly >> - Use static buffer for method name > > src/hotspot/os/linux/compilerThreadTimeout_linux.cpp line 47: > >> 45: char method_name_buf[SIZE]; >> 46: task->method()->name_and_sig_as_C_string(method_name_buf, SIZE); >> 47: assert(false, "compile task %d (%s) timed out after " INTX_FORMAT " ms", > > Can we use %zd here? INTX_FORMAT was removed in JDK-8346990. Yes, 64738e2 fixes this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2285892747 From kvn at openjdk.org Tue Aug 19 17:42:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 19 Aug 2025 17:42:48 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: <_CK_FCIS50G0zdQJ23gx9Vl1EEsA3WVNR4Up95Oei9A=.d2c45700-64ff-4cd9-a0a9-7bcc2ebdd19c@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> <_CK_FCIS50G0zdQJ23gx9Vl1EEsA3WVNR4Up95Oei9A=.d2c45700-64ff-4cd9-a0a9-7bcc2ebdd19c@github.com> Message-ID: On Tue, 19 Aug 2025 16:47:41 GMT, Emanuel Peter wrote: >> The patch already adds these comments: >> >> - `predicates.hpp` >> - https://github.com/openjdk/jdk/pull/24278/files#diff-d3883ecef2a7ed7fecf2f7b3b7d60c898b97d4199717552ecd52c3973e298a68R88-R102 >> - `VTransform::apply_speculative_aliasing_runtime_checks` uses pre-existing `add_speculative_check` >> >> Before the patch, we already have: >> - `add_speculative_check`: I think it reads quite clearly, but it does not have any good descriptions. It calls: >> - `create_new_if_for_predicate`: no mention about multiversioning... but predicates apply to non auto-vec uses, so it should probably not be placed there. >> - `create_new_if_for_multiversion`: does not mention mention much. I'll add a link to `maybe_multiversion_for_auto_vectorization_runtime_checks` where there is more documentation. >> - `PhaseIdealLoop::maybe_multiversion_for_auto_vectorization_runtime_checks` mentions that we only multiversion if there is no predicate. >> >> I'm adding some more documentation and cross-links. > > At some point, we need a more high-level documentation in `superword.hpp`. Currently, there is some documentation in `SuperWord::SLP_extract`, but that is not very easy to find, and I think there are also some inaccuracies there. But I'll look at that in a future RFE. I think you can do major documentation update in separate RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2285910935 From kvn at openjdk.org Tue Aug 19 17:42:49 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 19 Aug 2025 17:42:49 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v17] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Tue, 19 Aug 2025 16:58:55 GMT, Emanuel Peter wrote: >> TODO work that arose during review process / recent merges with master: >> >> - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. >> - Vladimir: consider disabling `UseAutoVectorizationSpeculativeAliasingChecks` if neither predicate nor multiversioning are available. >> - Test failure with multiversioning: `/home/empeter/Documents/oracle/jtreg/bin/jtreg -va -s -jdk:/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64-debug/jdk -javaoptions:"-Djdk.test.lib.random.seed=-9045761078153722515" -J-Djavatest.maxOutputSize=10000000 /home/empeter/Documents/oracle/jdk-fork6/open/test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java` >> - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. >> >> --------------- >> >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more documentation for Vladimir src/hotspot/share/opto/vtransform.cpp line 402: > 400: // Runtime Checks: > 401: // Some required properties cannot be proven statically, and require a > 402: // runtime check: Good comment ! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2285916216 From dlong at openjdk.org Tue Aug 19 22:30:38 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 19 Aug 2025 22:30:38 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v12] In-Reply-To: References: Message-ID: On Tue, 19 Aug 2025 17:31:58 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. >> >> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. >> >> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. >> >> Testing: >> - [x] Github Actions >> - [x] tier1, tier2 on all platforms >> - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug >> - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Print with %zd Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26023#pullrequestreview-3134033980 From duke at openjdk.org Tue Aug 19 23:51:46 2025 From: duke at openjdk.org (Samuel Chee) Date: Tue, 19 Aug 2025 23:51:46 GMT Subject: Integrated: 8361890: Aarch64: Removal of redundant dmb from C1 AtomicLong methods In-Reply-To: <60YMRP6cNslwEeVX2TWmnMYdO872xGaeShKMEj0dWGY=.2f4f504f-93d1-4bab-b721-e5c964f4c465@github.com> References: <60YMRP6cNslwEeVX2TWmnMYdO872xGaeShKMEj0dWGY=.2f4f504f-93d1-4bab-b721-e5c964f4c465@github.com> Message-ID: On Thu, 10 Jul 2025 15:49:40 GMT, Samuel Chee wrote: > The current C1 implementation of AtomicLong methods > which either adds or exchanges (such as getAndAdd) > emit one of a ldaddal and swpal respectively when using > LSE as well as an immediately proceeding dmb. Since > ldaddal/swpal have both acquire and release semantics, > this provides similar ordering guarantees to a dmb.full > so the dmb here is redundant and can be removed. > > This is due to both clause 7 and clause 11 of the > definition of Barrier-ordered-before in B2.3.7 of the > DDI0487 L.a Arm Architecture Reference Manual for A-profile > architecture being satisfied by the existence of a > ldaddal/swpal which ensures such memory ordering guarantees. This pull request has now been integrated. Changeset: 95577ca9 Author: Samuel Chee Committer: Dean Long URL: https://git.openjdk.org/jdk/commit/95577ca97f82a5a83e86ed932c7c42b644d32cca Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8361890: Aarch64: Removal of redundant dmb from C1 AtomicLong methods Reviewed-by: aph, dlong ------------- PR: https://git.openjdk.org/jdk/pull/26245 From duke at openjdk.org Wed Aug 20 00:08:46 2025 From: duke at openjdk.org (duke) Date: Wed, 20 Aug 2025 00:08:46 GMT Subject: Withdrawn: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: On Thu, 22 May 2025 08:35:18 GMT, Roland Westrelin wrote: > The test case has an out of loop `Store` with an `AddP` address > expression that has other uses and is in the loop body. Schematically, > only showing the address subgraph and the bases for the `AddP`s: > > > Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 > -> CastPP#110 > > > Both `AddP`s have the same base, a `CastPP` that's also in the loop > body. > > That loop is a counted loop and only has 3 iterations so is fully > unrolled. First, one iteration is peeled: > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > The `AddP`s and `CastPP` are cloned (because in the loop body). As > part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is > called. It finds the test that guards `CastPP#283` in the peeled > iteration dominates and replaces the test that guards `CastPP#110` > (the test in the peeled iteration is the clone of the test in the > loop). That causes `CastPP#110`'s control to be updated to that of the > test in the peeled iteration and to be yanked from the loop. So now > `CastPP#283` and `CastPP#110` have the same inputs. > > Next unrolling happens: > > > /-> CastPP#110 > /-> AddP#400 -> AddP#401 -> CastPP#110 > Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 > \ -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > `AddP`s are cloned once more but not the `CastPP`s because they are > both in the peeled iteration now. A new `Phi` is added. > > Next igvn runs. It's going to push the `AddP`s through the `Phi`s. > > Through `Phi#477`: > > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 > \ -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > Through `Phi#360`: > > > /-> AddP#134 -> CastPP#110 > /-> Phi#509 -> AddP#401 -> CastPP#110 > Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 > -> Phi#514 -> CastPP#283 > -> CastP#110 > > > Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is > transformed into anot... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/25386 From duke at openjdk.org Wed Aug 20 01:21:04 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Wed, 20 Aug 2025 01:21:04 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' Message-ID: - `opto/phasetype.hpp` defines `static const char* phase_names[]` - `compiler/compilerEvent.cpp` defines `static GrowableArray* phase_names` This is not a problem when the two files are compiled as different translation units, but it causes a build failure if any of them is pulled in by a precompiled header: /jdk/src/hotspot/share/compiler/compilerEvent.cpp:59:36: error: redefinition of 'phase_names' with a different type: 'GrowableArray *' vs 'const char *[100]' 59 | static GrowableArray* phase_names = nullptr; | ^ /jdk/src/hotspot/share/opto/phasetype.hpp:147:20: note: previous definition is here 147 | static const char* phase_names[] = { | ^ /jdk/src/hotspot/share/compiler/compilerEvent.cpp:67:39: error: member reference base type 'const char *' is not a structure or union 67 | const u4 nof_entries = phase_names->length(); | ~~~~~~~~~~~^ ~~~~~~ /jdk/src/hotspot/share/compiler/compilerEvent.cpp:71:31: error: member reference base type 'const char *' is not a structure or union 71 | writer.write(phase_names->at(i)); | ~~~~~~~~~~~^ ~~ /jdk/src/hotspot/share/compiler/compilerEvent.cpp:77:34: error: member reference base type 'const char *' is not a structure or union 77 | for (int i = 0; i < phase_names->length(); i++) { | ~~~~~~~~~~~^ ~~~~~~ /jdk/src/hotspot/share/compiler/compilerEvent.cpp:78:35: error: member reference base type 'const char *' is not a structure or union 78 | const char* name = phase_names->at(i); | ~~~~~~~~~~~^ ~~ /jdk/src/hotspot/share/compiler/compilerEvent.cpp:91:9: error: comparison of array 'phase_names' equal to a null pointer is always false [-Werror,-Wtautological-pointer-compare] 91 | if (phase_names == nullptr) { | ^~~~~~~~~~~ ~~~~~~~ /jdk/src/hotspot/share/compiler/compilerEvent.cpp:92:19: error: array type 'const char *[100]' is not assignable 92 | phase_names = new (mtInternal) GrowableArray(100, mtCompiler); | ~~~~~~~~~~~ ^ /jdk/src/hotspot/share/compiler/compilerEvent.cpp:103:24: error: member reference base type 'const char *' is not a structure or union 103 | index = phase_names->length(); | ~~~~~~~~~~~^ ~~~~~~ /jdk/src/hotspot/share/compiler/compilerEvent.cpp:104:16: error: member reference base type 'const char *' is not a structure or union 104 | phase_names->append(use_strdup ? os::strdup(phase_name) : phase_name); | ~~~~~~~~~~~^ ~~~~~~ 9 errors generated. Passes `tier1`. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/26851/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26851&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365829 Stats: 8 lines in 2 files changed: 2 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/26851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26851/head:pull/26851 PR: https://git.openjdk.org/jdk/pull/26851 From kbarrett at openjdk.org Wed Aug 20 02:55:36 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 20 Aug 2025 02:55:36 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 01:12:36 GMT, Francesco Andreuzzi wrote: > - `opto/phasetype.hpp` defines `static const char* phase_names[]` > - `compiler/compilerEvent.cpp` defines `static GrowableArray* phase_names` > > This is not a problem when the two files are compiled as different translation units, but it causes a build failure if any of them is pulled in by a precompiled header: > > > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:59:36: error: redefinition of 'phase_names' with a different type: 'GrowableArray *' vs 'const char *[100]' > 59 | static GrowableArray* phase_names = nullptr; > | ^ > /jdk/src/hotspot/share/opto/phasetype.hpp:147:20: note: previous definition is here > 147 | static const char* phase_names[] = { > | ^ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:67:39: error: member reference base type 'const char *' is not a structure or union > 67 | const u4 nof_entries = phase_names->length(); > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:71:31: error: member reference base type 'const char *' is not a structure or union > 71 | writer.write(phase_names->at(i)); > | ~~~~~~~~~~~^ ~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:77:34: error: member reference base type 'const char *' is not a structure or union > 77 | for (int i = 0; i < phase_names->length(); i++) { > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:78:35: error: member reference base type 'const char *' is not a structure or union > 78 | const char* name = phase_names->at(i); > | ~~~~~~~~~~~^ ~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:91:9: error: comparison of array 'phase_names' equal to a null pointer is always false [-Werror,-Wtautological-pointer-compare] > 91 | if (phase_names == nullptr) { > | ^~~~~~~~~~~ ~~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:92:19: error: array type 'const char *[100]' is not assignable > 92 | phase_names = new (mtInternal) GrowableArray(100, mtCompiler); > | ~~~~~~~~~~~ ^ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:103:24: error: member reference base type 'const char *' is not a structure or union > 103 | index = phase_names->length(); > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:104:16: error: member reference base type 'const char *' is not a structure or union > 104 | phase_names->append(use_strdup ? os::strdup(phase_name) : phase_name); > | ~~~~~~~~~~~^ ~~~~~~ > 9 errors generated. > > > Passes `tier1`. Changes requested by kbarrett (Reviewer). src/hotspot/share/compiler/compilerEvent.cpp line 61: > 59: namespace { > 60: GrowableArray* phase_names = nullptr; > 61: } Don't use anonymous namespaces. See Style Guide. src/hotspot/share/opto/phasetype.hpp line 141: > 139: #undef table_entry > 140: > 141: static constexpr const char* compiler_phase_descriptions[] = { A simpler and better solution would be to make `phase_descriptions` and `phase_names` static data members of `CompilerPhaseTypeHelper`, with just a declaration in the header, and the definition in a new .cpp file. (Note that with C++17 they could be declared `inline` and the .cpp file isn't needed.) I'm not sure why the change from `const` to `constexpr` is being made here. Doesn't this have a problem that each translation unit including this header gets it's own private copy of these arrays? And doesn't that introduce an ODR violation for the referring code in CompilerPhaseTypeHelper? (Maybe the `constexpr` change has something to do with that? But I'm not sure how.) ------------- PR Review: https://git.openjdk.org/jdk/pull/26851#pullrequestreview-3134539820 PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2286794938 PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2286851986 From vlivanov at openjdk.org Wed Aug 20 04:32:38 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 20 Aug 2025 04:32:38 GMT Subject: RFR: 8360304: Redundant condition in LibraryCallKit::inline_vector_nary_operation In-Reply-To: References: Message-ID: <9wFFnykOrv9duLAVXEf8qytqcwOt9PGYp_uHohYbKzA=.7ee40a8e-906a-4f18-a736-8ad39f01ebec@github.com> On Sat, 2 Aug 2025 15:44:22 GMT, Francesco Andreuzzi wrote: > The check for `sopc != 0` is not needed after JDK-8353786, the function would exit at L374 otherwise. > > Passes tier1. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26606#pullrequestreview-3134744012 From duke at openjdk.org Wed Aug 20 06:50:38 2025 From: duke at openjdk.org (duke) Date: Wed, 20 Aug 2025 06:50:38 GMT Subject: RFR: 8360304: Redundant condition in LibraryCallKit::inline_vector_nary_operation In-Reply-To: References: Message-ID: On Sat, 2 Aug 2025 15:44:22 GMT, Francesco Andreuzzi wrote: > The check for `sopc != 0` is not needed after JDK-8353786, the function would exit at L374 otherwise. > > Passes tier1. @fandreuz Your change (at version 86366e18cd507c860a7acb27af7ab8c35c2f87ec) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26606#issuecomment-3204418323 From epeter at openjdk.org Wed Aug 20 06:55:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Aug 2025 06:55:44 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: <59dW-P8qExfEfXqud1rOPax4qGcubqi9RQxM4tJLQoQ=.dd1a3fb3-8ded-4e2d-bc25-49456e7ab46f@github.com> On Tue, 5 Aug 2025 11:39:43 GMT, Galder Zamarre?o wrote: >> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. >> >> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: >> >> >> Benchmark (seed) (size) Mode Cnt Base Patch Units Diff >> VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% >> VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% >> VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% >> VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% >> VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% >> VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% >> >> >> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. >> >> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Check at the very least that auto vectorization is supported Had a quick look again and found a few more suggestions in the tests/benchmarks. But I think the VM changes are solid :) test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java line 407: > 405: > 406: @Test > 407: @IR(counts = {IRNode.STORE_VECTOR, "> 0"}, Since you are already fixing up some things here, and we want to be really sure that the vectorization generates correct results, can you please do the following: - Create IR rule counts for not just the store, but also load and the MoveX2Y. For negative rules it is ok to only check for store, but for positive rules we should try to list all vectors we expect. - Replace the `Random` usage with `Generators`. This ensures we cover NaN's and other special values more often. test/micro/org/openjdk/bench/vm/compiler/VectorBitConversion.java line 90: > 88: > 89: @Benchmark > 90: public long[] doubleToLongBits() { I wonder if we should not just extend this benchmark, that has `convertI2F` etc: `test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java` Just a suggestion, we can also keep them separately. Maybe one day we should clean up the benchmarks, and put them all in some `autovectorization` subdirectory, and organize the files and benchmarks a little better. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26457#pullrequestreview-3135001659 PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2287146098 PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2287159346 From dzhang at openjdk.org Wed Aug 20 07:09:08 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 20 Aug 2025 07:09:08 GMT Subject: RFR: 8365841: RISC-V: Several IR verification tests fail after JDK-8350960 without Zvfh Message-ID: Hi, Can you help to review this patch? Thanks! The error in both cases is caused by the same reason: the target IR, MulReductionVI, is not matched. This is because the match_rule_supported_vector in riscv_v.ad is missing a break. If the if condition in `case MulReductionVI` evaluates to false, the loop will not exit until the `return UseZvfh`. Failed IR tests: compiler/loopopts/superword/ProdRed_Int.java compiler/loopopts/superword/RedTest_int.java ### Test (fastdebug) - [x] Run compiler/loopopts/superword/ProdRed_Int.java on k1 and k230 - [x] Run compiler/loopopts/superword/RedTest_int.java on k1 and k230 ------------- Commit messages: - 8365841: RISC-V: Several IR verification tests fail after JDK-8350960 without Zvfh Changes: https://git.openjdk.org/jdk/pull/26854/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26854&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365841 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26854.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26854/head:pull/26854 PR: https://git.openjdk.org/jdk/pull/26854 From fyang at openjdk.org Wed Aug 20 07:22:36 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 20 Aug 2025 07:22:36 GMT Subject: RFR: 8365841: RISC-V: Several IR verification tests fail after JDK-8350960 without Zvfh In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 07:01:59 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > The error in both cases is caused by the same reason: the target IR, MulReductionVI, is not matched. > This is because the match_rule_supported_vector in riscv_v.ad is missing a break. If the if condition in `case MulReductionVI` evaluates to false, the loop will not exit until the `return UseZvfh`. > > Failed IR tests: > compiler/loopopts/superword/ProdRed_Int.java > compiler/loopopts/superword/RedTest_int.java > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/ProdRed_Int.java on k1 and k230 > - [x] Run compiler/loopopts/superword/RedTest_int.java on k1 and k230 Good catch! Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26854#pullrequestreview-3135122539 From fjiang at openjdk.org Wed Aug 20 07:39:39 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 20 Aug 2025 07:39:39 GMT Subject: RFR: 8365841: RISC-V: Several IR verification tests fail after JDK-8350960 without Zvfh In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 07:01:59 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > The error in both cases is caused by the same reason: the target IR, MulReductionVI, is not matched. > This is because the match_rule_supported_vector in riscv_v.ad is missing a break. If the if condition in `case MulReductionVI` evaluates to false, the loop will not exit until the `return UseZvfh`. > > Failed IR tests: > compiler/loopopts/superword/ProdRed_Int.java > compiler/loopopts/superword/RedTest_int.java > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/ProdRed_Int.java on k1 and k230 > - [x] Run compiler/loopopts/superword/RedTest_int.java on k1 and k230 Thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/26854#pullrequestreview-3135198257 From dzhang at openjdk.org Wed Aug 20 07:53:36 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 20 Aug 2025 07:53:36 GMT Subject: RFR: 8365841: RISC-V: Several IR verification tests fail after JDK-8350960 without Zvfh In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 07:01:59 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > The error in both cases is caused by the same reason: the target IR, MulReductionVI, is not matched. > This is because the match_rule_supported_vector in riscv_v.ad is missing a break. If the if condition in `case MulReductionVI` evaluates to false, the loop will not exit until the `return UseZvfh`. > > Failed IR tests: > compiler/loopopts/superword/ProdRed_Int.java > compiler/loopopts/superword/RedTest_int.java > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/ProdRed_Int.java on k1 and k230 > - [x] Run compiler/loopopts/superword/RedTest_int.java on k1 and k230 Hi @Hamlin-Li , could you help to review this patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26854#issuecomment-3204692231 From wenanjian at openjdk.org Wed Aug 20 07:59:04 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 20 Aug 2025 07:59:04 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v2] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: delete useless Label, change L_judge_used to L_slow_loop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/7e16d2b0..d7ddad6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From dzhang at openjdk.org Wed Aug 20 08:02:20 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 20 Aug 2025 08:02:20 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV Message-ID: Hi, Can you help to review this patch? Thanks! We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. ### Test (fastdebug) - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 ------------- Commit messages: - 8365844: RISC-V: TestBadFormat.java fails when running without RVV Changes: https://git.openjdk.org/jdk/pull/26855/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26855&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365844 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26855.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26855/head:pull/26855 PR: https://git.openjdk.org/jdk/pull/26855 From bkilambi at openjdk.org Wed Aug 20 08:04:50 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 20 Aug 2025 08:04:50 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 13:08:06 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > There's something that I still do not understand. > > In your tests I see this: > > > // For vectorizable loops containing FP16 operations with an FP16 constant as one of the inputs, the IR > // node `(dst (Replicate con))` is generated to broadcast the constant into all lanes of an SVE register. > // On SVE-capable hardware with vector length > 16B, if the FP16 immediate is a signed value within the > // range [-128, 127] or a signed multiple of 256 in the range [-32768, 32512] for element widths of > // 16 bits or higher then the backend should generate the "replicateHF_imm_gt128b" machnode. > > > Why is this restricted to special constants? You should be able to do this with any value by generating > `mov rtemp, #n; dup zn.h, rtemp`. There's no need to generate `mov rtemp, #n; fmov stemp, rtemp; dup zn.h, stemp` Hi @theRealAph if you got a chance to take a look at my response, was it clear enough or do you think this patch needs more changes? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3204737087 From duke at openjdk.org Wed Aug 20 08:21:57 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Wed, 20 Aug 2025 08:21:57 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v2] In-Reply-To: References: Message-ID: > - `opto/phasetype.hpp` defines `static const char* phase_names[]` > - `compiler/compilerEvent.cpp` defines `static GrowableArray* phase_names` > > This is not a problem when the two files are compiled as different translation units, but it causes a build failure if any of them is pulled in by a precompiled header: > > > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:59:36: error: redefinition of 'phase_names' with a different type: 'GrowableArray *' vs 'const char *[100]' > 59 | static GrowableArray* phase_names = nullptr; > | ^ > /jdk/src/hotspot/share/opto/phasetype.hpp:147:20: note: previous definition is here > 147 | static const char* phase_names[] = { > | ^ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:67:39: error: member reference base type 'const char *' is not a structure or union > 67 | const u4 nof_entries = phase_names->length(); > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:71:31: error: member reference base type 'const char *' is not a structure or union > 71 | writer.write(phase_names->at(i)); > | ~~~~~~~~~~~^ ~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:77:34: error: member reference base type 'const char *' is not a structure or union > 77 | for (int i = 0; i < phase_names->length(); i++) { > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:78:35: error: member reference base type 'const char *' is not a structure or union > 78 | const char* name = phase_names->at(i); > | ~~~~~~~~~~~^ ~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:91:9: error: comparison of array 'phase_names' equal to a null pointer is always false [-Werror,-Wtautological-pointer-compare] > 91 | if (phase_names == nullptr) { > | ^~~~~~~~~~~ ~~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:92:19: error: array type 'const char *[100]' is not assignable > 92 | phase_names = new (mtInternal) GrowableArray(100, mtCompiler); > | ~~~~~~~~~~~ ^ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:103:24: error: member reference base type 'const char *' is not a structure or union > 103 | index = phase_names->length(); > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:104:16: error: member reference base type 'const char *' is not a structure or union > 104 | phase_names->append(use_strdup ? os::strdup(phase_name) : phase_name); > | ~~~~~~~~~~~^ ~~~~~~ > 9 errors generated. > > > Passes `tier1`. Francesco Andreuzzi has updated the pull request incrementally with seven additional commits since the last revision: - static - nn - indent - review sugg - revert - Merge branch 'resolved-default-cctor' into JDK-8365829 - use copy ctor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26851/files - new: https://git.openjdk.org/jdk/pull/26851/files/dc4fa3ac..a2183a2c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26851&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26851&range=00-01 Stats: 58 lines in 3 files changed: 40 ins; 14 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26851/head:pull/26851 PR: https://git.openjdk.org/jdk/pull/26851 From duke at openjdk.org Wed Aug 20 08:21:58 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Wed, 20 Aug 2025 08:21:58 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v2] In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 01:55:53 GMT, Kim Barrett wrote: >> Francesco Andreuzzi has updated the pull request incrementally with seven additional commits since the last revision: >> >> - static >> - nn >> - indent >> - review sugg >> - revert >> - Merge branch 'resolved-default-cctor' into JDK-8365829 >> - use copy ctor > > src/hotspot/share/compiler/compilerEvent.cpp line 61: > >> 59: namespace { >> 60: GrowableArray* phase_names = nullptr; >> 61: } > > Don't use anonymous namespaces. See Style Guide. Reverted in e7122cd679682d4550c5c1b18949a1d072e440df ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2287371820 From aph at openjdk.org Wed Aug 20 08:51:45 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 20 Aug 2025 08:51:45 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: On Fri, 15 Aug 2025 11:54:59 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26589#pullrequestreview-3135462027 From duke at openjdk.org Wed Aug 20 08:52:43 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Wed, 20 Aug 2025 08:52:43 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v2] In-Reply-To: References: Message-ID: <1aZWjalVNwN7LqWckIzakJ6wMyZ2M8BUUEHpGcWlQAE=.80c71701-4278-45c5-928a-29b642c33faa@github.com> On Wed, 20 Aug 2025 02:52:50 GMT, Kim Barrett wrote: > A simpler and better solution would be to make phase_descriptions and phase_names static data members of CompilerPhaseTypeHelper, with just a declaration in the header, and the definition in a new .cpp file. (Note that with C++17 they could be declared inline and the .cpp file isn't needed.) Got it, applied this suggestion in 44aabc0a8c3115cf5f0559ee2ecc6ca1d42b2464 > I'm not sure why the change from const to constexpr is being made here. I saw both arrays can be constructed at compile time and figured `phase_names` and `phase_descriptions` and thought it may be desirable to do so, since `constexpr` is accepted in the style guide. It's not relevant for this fix though, I reverted the change. > Doesn't this have a problem that each translation unit including this header gets it's own private copy of these arrays? And doesn't that introduce an ODR violation for the referring code in CompilerPhaseTypeHelper? (Maybe the constexpr change has something to do with that? But I'm not sure how.) Yes, it seems there's an ODR violation here: `const` (which is implied by `constexpr`, so no difference there) gives the symbol [internal linkage](https://en.cppreference.com/w/cpp/language/storage_duration.html). It looks to me that `static` qualifier was redundant in the first place. The ODR violation should be fixed by your solution. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2287454983 From aph at openjdk.org Wed Aug 20 08:58:50 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 20 Aug 2025 08:58:50 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v8] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: <1u3X9cabv-ybTzQXq7TAWha_8_97OBHH5-icvDiBqUk=.aba537c3-36bf-46d2-9d35-d755a29aca97@github.com> On Wed, 13 Aug 2025 09:35:08 GMT, Saranya Natarajan wrote: >> **Issue** >> Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. >> >> **Analysis** >> On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. >> >> **Proposal** >> Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. >> >> **Issue in AArch64** >> Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? >> >> **Request to port maintainers** >> @dafedafe suggested that we keep the upper boun... > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > additions for linux-riscv64 I don't think we want different limits in BciProfileWidth for AArch64 and x86. Please use `form_address` as necessary. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3204917395 From bkilambi at openjdk.org Wed Aug 20 09:00:45 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 20 Aug 2025 09:00:45 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 09:01:51 GMT, Aleksey Shipilev wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments and modified some comments > > Can we/should we plug this problem in encoding first, without going too much into the optimizing the non-broken case? As it stands now, real FP16-using code can run into matcher errors in JDK 25. I would like to fix that first. Hi @shipilev thanks for being patient :) If you feel we are good to go, could I ask for an approval from you as well? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3204931634 From mli at openjdk.org Wed Aug 20 09:15:36 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 20 Aug 2025 09:15:36 GMT Subject: RFR: 8365841: RISC-V: Several IR verification tests fail after JDK-8350960 without Zvfh In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 07:01:59 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > The error in both cases is caused by the same reason: the target IR, MulReductionVI, is not matched. > This is because the match_rule_supported_vector in riscv_v.ad is missing a break. If the if condition in `case MulReductionVI` evaluates to false, the loop will not exit until the `return UseZvfh`. > > Failed IR tests: > compiler/loopopts/superword/ProdRed_Int.java > compiler/loopopts/superword/RedTest_int.java > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/ProdRed_Int.java on k1 and k230 > - [x] Run compiler/loopopts/superword/RedTest_int.java on k1 and k230 Looks good, thanks for fixing this! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26854#pullrequestreview-3135568340 From amitkumar at openjdk.org Wed Aug 20 09:42:08 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 20 Aug 2025 09:42:08 GMT Subject: RFR: 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size [v3] In-Reply-To: References: Message-ID: > There isn't enough initial cache present which can let the interpreter mode run freely. So before even we reach to the compiler phase and try to bail out, in case there isn't enough space left for the stub compilation, JVM crashes. Idea is to increase the Initial cache size and make it enough to run interpreter mode at least. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: adds comment for larger size requirement ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25741/files - new: https://git.openjdk.org/jdk/pull/25741/files/12b60494..2672c360 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25741&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25741&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25741.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25741/head:pull/25741 PR: https://git.openjdk.org/jdk/pull/25741 From jbhateja at openjdk.org Wed Aug 20 10:11:47 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 20 Aug 2025 10:11:47 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v8] In-Reply-To: References: Message-ID: > Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. > It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. > > Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). > > Vector API jtreg tests pass at AVX level 2, remaining validation in progress. > > Performance numbers: > > > System : 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms > VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms > VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms > VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms > VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms > VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update callGenerator.hpp copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24104/files - new: https://git.openjdk.org/jdk/pull/24104/files/70c22932..340f1849 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104 PR: https://git.openjdk.org/jdk/pull/24104 From snatarajan at openjdk.org Wed Aug 20 10:31:22 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 20 Aug 2025 10:31:22 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v9] In-Reply-To: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: <-SMEjFj6yBqm_pNlCxGL9l5sBegIXLyz28nPhdcW-6U=.06a121a1-ce40-4bc0-89c4-bb1dbe90c489@github.com> > **Issue** > Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. > > **Analysis** > On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. > > **Proposal** > Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. > > **Issue in AArch64** > Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. > > **Question to reviewers** > Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? > > **Request to port maintainers** > @dafedafe suggested that we keep the upper bound of `BciProfileWidth` to 1000 pro... Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: fix for PPC64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26139/files - new: https://git.openjdk.org/jdk/pull/26139/files/2f511dbf..4cd5f6c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26139&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26139&range=07-08 Stats: 9 lines in 2 files changed: 5 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26139/head:pull/26139 PR: https://git.openjdk.org/jdk/pull/26139 From snatarajan at openjdk.org Wed Aug 20 10:31:23 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 20 Aug 2025 10:31:23 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v8] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: On Tue, 19 Aug 2025 14:01:17 GMT, Martin Doerr wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> additions for linux-riscv64 > > Can you add this patch, please? > > diff --git a/src/hotspot/cpu/ppc/interp_masm_ppc.hpp b/src/hotspot/cpu/ppc/interp_masm_ppc.hpp > index d3969427db3..ac3825d152f 100644 > --- a/src/hotspot/cpu/ppc/interp_masm_ppc.hpp > +++ b/src/hotspot/cpu/ppc/interp_masm_ppc.hpp > @@ -228,7 +228,7 @@ class InterpreterMacroAssembler: public MacroAssembler { > > // Interpreter profiling operations > void set_method_data_pointer_for_bcp(); > - void test_method_data_pointer(Label& zero_continue); > + void test_method_data_pointer(Label& zero_continue, bool may_be_far = false); > void verify_method_data_pointer(); > > void set_mdp_data_at(int constant, Register value); > diff --git a/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp b/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp > index 29fb54250c2..7557709653a 100644 > --- a/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp > +++ b/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp > @@ -1249,10 +1249,14 @@ void InterpreterMacroAssembler::set_method_data_pointer_for_bcp() { > } > > // Test ImethodDataPtr. If it is null, continue at the specified label. > -void InterpreterMacroAssembler::test_method_data_pointer(Label& zero_continue) { > +void InterpreterMacroAssembler::test_method_data_pointer(Label& zero_continue, bool may_be_far) { > assert(ProfileInterpreter, "must be profiling interpreter"); > cmpdi(CR0, R28_mdx, 0); > - beq(CR0, zero_continue); > + if (may_be_far) { > + bc_far_optimized(Assembler::bcondCRbiIs1, bi0(CR0, Assembler::equal), zero_continue); > + } else { > + beq(CR0, zero_continue); > + } > } > > void InterpreterMacroAssembler::verify_method_data_pointer() { > @@ -1555,7 +1559,7 @@ void InterpreterMacroAssembler::profile_ret(TosState state, Register return_bci, > uint row; > > // If no method data exists, go to profile_continue. > - test_method_data_pointer(profile_continue); > + test_method_data_pointer(profile_continue, true); > > // Update the total ret count. > increment_mdp_data_at(in_bytes(CounterData::count_offset()), scratch1, scratch2 ); @TheRealMDoerr : Thank you for the patch. I have added the changes suggested by you. Could you review if it looks good ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3205435266 From snatarajan at openjdk.org Wed Aug 20 10:34:40 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 20 Aug 2025 10:34:40 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v8] In-Reply-To: <1u3X9cabv-ybTzQXq7TAWha_8_97OBHH5-icvDiBqUk=.aba537c3-36bf-46d2-9d35-d755a29aca97@github.com> References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> <1u3X9cabv-ybTzQXq7TAWha_8_97OBHH5-icvDiBqUk=.aba537c3-36bf-46d2-9d35-d755a29aca97@github.com> Message-ID: On Wed, 20 Aug 2025 08:56:21 GMT, Andrew Haley wrote: > I don't think we want different limits in BciProfileWidth for AArch64 and x86. Please use `form_address` as necessary. @theRealAph :Thank you for the review. Based on @dafedafe's comment, we have decided to keep the upper bound for BciProfileWidth at 1000, which is the same for both AArch64 and x86. With this upper bound, there is no need for `form_address`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3205465390 From eliu at openjdk.org Wed Aug 20 10:43:37 2025 From: eliu at openjdk.org (Eric Liu) Date: Wed, 20 Aug 2025 10:43:37 GMT Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI expand operation In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 09:02:01 GMT, erifan wrote: > Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified: > 1. **Subword types** on SVE2-capable hardware. > 2. **All types** on NEON and SVE1 environments. > > As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments. > > Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example: > > To compute: dst = src.expand(mask) > Data direction: high <== low > Input: > src = p o n m l k j i h g f e d c b a > mask = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 > Expected result: > dst = 0 0 h g 0 0 f e 0 0 d c 0 0 b a > > Step 1: calculate the index input of the TBL instruction. > > // Set tmp1 as all 0 vector. > tmp1 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > // Move the mask bits from the predicate register to a vector register. > // **1-bit** mask lane of P register to **8-bit** mask lane of V register. > tmp2 = mask = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 > > // Shift the entire register. Prefix sum algorithm. > dst = tmp2 << 8 = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 > tmp2 += dst = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1 > > dst = tmp2 << 16 = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0 > tmp2 += dst = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 > > dst = tmp2 << 32 = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0 > tmp2 += dst = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1 > > dst = tmp2 << 64 = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0 > tmp2 += dst = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1 > > // Clear inactive elements. > dst = sel(mask, tmp2, tmp1) = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1 > > // Set the inactive lane value to -1 and set the active lane to the target index. > dst -= 1 = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0 > > Step 2: shuffle the source vector elements to the target vector > > tbl(dst, src, dst) = 0 0 h g 0 0 f e 0 0 d c 0 0 b a > > > The same algorithm is used for NEON and SVE1, but with different instructions where appropriate. > > The following benchmarks are from panama-... LGMT. ------------- Marked as reviewed by eliu (Committer). PR Review: https://git.openjdk.org/jdk/pull/26740#pullrequestreview-3135951554 From aph at openjdk.org Wed Aug 20 11:30:37 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 20 Aug 2025 11:30:37 GMT Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI expand operation In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 09:02:01 GMT, erifan wrote: > Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified: > 1. **Subword types** on SVE2-capable hardware. > 2. **All types** on NEON and SVE1 environments. > > As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments. > > Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example: > > To compute: dst = src.expand(mask) > Data direction: high <== low > Input: > src = p o n m l k j i h g f e d c b a > mask = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 > Expected result: > dst = 0 0 h g 0 0 f e 0 0 d c 0 0 b a > > Step 1: calculate the index input of the TBL instruction. > > // Set tmp1 as all 0 vector. > tmp1 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > // Move the mask bits from the predicate register to a vector register. > // **1-bit** mask lane of P register to **8-bit** mask lane of V register. > tmp2 = mask = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 > > // Shift the entire register. Prefix sum algorithm. > dst = tmp2 << 8 = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 > tmp2 += dst = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1 > > dst = tmp2 << 16 = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0 > tmp2 += dst = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 > > dst = tmp2 << 32 = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0 > tmp2 += dst = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1 > > dst = tmp2 << 64 = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0 > tmp2 += dst = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1 > > // Clear inactive elements. > dst = sel(mask, tmp2, tmp1) = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1 > > // Set the inactive lane value to -1 and set the active lane to the target index. > dst -= 1 = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0 > > Step 2: shuffle the source vector elements to the target vector > > tbl(dst, src, dst) = 0 0 h g 0 0 f e 0 0 d c 0 0 b a > > > The same algorithm is used for NEON and SVE1, but with different instructions where appropriate. > > The following benchmarks are from panama-... The algorithm description here is great. Please paste all of it from "Since there are" to "but with different instructions where appropriate." into this PR, before the vector expand implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26740#issuecomment-3205780702 From fjiang at openjdk.org Wed Aug 20 11:55:42 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 20 Aug 2025 11:55:42 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 07:56:19 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. > The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. > > ### Test (fastdebug) > - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 Marked as reviewed by fjiang (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26855#pullrequestreview-3136246332 From galder at openjdk.org Wed Aug 20 12:01:38 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 20 Aug 2025 12:01:38 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> <4mfHZiUcDJ3W0p2WCzgtUwp-FWSBX6eXOg1zLfcs_H0=.f0909507-d6fe-4d2a-a543-db6445e7f605@github.com> Message-ID: On Tue, 12 Aug 2025 08:22:11 GMT, Bhavana Kilambi wrote: >> Btw, I've noticed that `TestFloat16ScalarOperations` does not have `package` definition. Is that an oversight? It runs fine in spite of not having it > > Hi, as you mostly touched the auto-vectorization part of c2, could you please run these float16 tests as well (most of these enable auto-vectorization for Float16) - > > `compiler/vectorization/TestFloat16VectorOperations.java` > `compiler/vectorization/TestFloatConversionsVectorNaN.java` > `compiler/vectorization/TestFloatConversionsVector.java` > `compiler/vectorization/TestFloat16ToFloatConv.java` > `compiler/vectorization/TestFloat16VectorConvChain.java` > `compiler/intrinsics/float16/*` @Bhavana-Kilambi I've run these tests: "test/hotspot/jtreg/compiler/c2/irTests/ConvF2HFIdealizationTests.java" "test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java" "test/hotspot/jtreg/compiler/intrinsics/float16/*" "test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java" "test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java" "test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java" "test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java" "test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVectorNaN.java" On x86: Test results: passed: 11; did not meet platform requirements: 1 (TestFloatConversionsVectorNaN is for riscv) On graviton 3 aarch64: Test results: passed: 10; failed: 1; did not meet platform requirements: 1 The failure on aarch64 is already existing issue [JDK-8361582](https://bugs.openjdk.org/browse/JDK-8361582) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2287939342 From epeter at openjdk.org Wed Aug 20 12:12:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Aug 2025 12:12:39 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> <4mfHZiUcDJ3W0p2WCzgtUwp-FWSBX6eXOg1zLfcs_H0=.f0909507-d6fe-4d2a-a543-db6445e7f605@github.com> Message-ID: On Wed, 20 Aug 2025 11:58:25 GMT, Galder Zamarre?o wrote: >> Hi, as you mostly touched the auto-vectorization part of c2, could you please run these float16 tests as well (most of these enable auto-vectorization for Float16) - >> >> `compiler/vectorization/TestFloat16VectorOperations.java` >> `compiler/vectorization/TestFloatConversionsVectorNaN.java` >> `compiler/vectorization/TestFloatConversionsVector.java` >> `compiler/vectorization/TestFloat16ToFloatConv.java` >> `compiler/vectorization/TestFloat16VectorConvChain.java` >> `compiler/intrinsics/float16/*` > > @Bhavana-Kilambi I've run these tests: > > > "test/hotspot/jtreg/compiler/c2/irTests/ConvF2HFIdealizationTests.java" > "test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java" > "test/hotspot/jtreg/compiler/intrinsics/float16/*" > "test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java" > "test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java" > "test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java" > "test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java" > "test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVectorNaN.java" > > > On x86: > > Test results: passed: 11; did not meet platform requirements: 1 > (TestFloatConversionsVectorNaN is for riscv) > > > On graviton 3 aarch64: > > Test results: passed: 10; failed: 1; did not meet platform requirements: 1 > > > The failure on aarch64 is already existing issue [JDK-8361582](https://bugs.openjdk.org/browse/JDK-8361582) @galderz Excellent, that's great :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2287967911 From mdoerr at openjdk.org Wed Aug 20 12:21:45 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 20 Aug 2025 12:21:45 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v9] In-Reply-To: <-SMEjFj6yBqm_pNlCxGL9l5sBegIXLyz28nPhdcW-6U=.06a121a1-ce40-4bc0-89c4-bb1dbe90c489@github.com> References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> <-SMEjFj6yBqm_pNlCxGL9l5sBegIXLyz28nPhdcW-6U=.06a121a1-ce40-4bc0-89c4-bb1dbe90c489@github.com> Message-ID: On Wed, 20 Aug 2025 10:31:22 GMT, Saranya Natarajan wrote: >> **Issue** >> Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. >> >> **Analysis** >> On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. >> >> **Proposal** >> Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. >> >> **Issue in AArch64** >> Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? >> >> **Request to port maintainers** >> @dafedafe suggested that we keep the upper boun... > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > fix for PPC64 Thanks. This looks basically good and the test passes on PPC64. Only minor nits. src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp line 1252: > 1250: > 1251: // Test ImethodDataPtr. If it is null, continue at the specified label. > 1252: void InterpreterMacroAssembler::test_method_data_pointer(Label& zero_continue, bool may_be_far) { Spaces at the beginning should be removed. src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp line 1260: > 1258: beq(CR0, zero_continue); > 1259: } > 1260: No need for an empty line. test/lib-test/jdk/test/whitebox/vm_flags/IntxTest.java line 42: > 40: private static final long COMPILE_THRESHOLD = VmFlagTest.WHITE_BOX.getIntxVMFlag("CompileThreshold"); > 41: private static final Long[] TESTS = {0L, 100L, (long)(Integer.MAX_VALUE>>3)*100L}; > 42: This empty line should not be removed. ------------- PR Review: https://git.openjdk.org/jdk/pull/26139#pullrequestreview-3136336983 PR Review Comment: https://git.openjdk.org/jdk/pull/26139#discussion_r2287984667 PR Review Comment: https://git.openjdk.org/jdk/pull/26139#discussion_r2287989584 PR Review Comment: https://git.openjdk.org/jdk/pull/26139#discussion_r2287985495 From dholmes at openjdk.org Wed Aug 20 12:28:38 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 20 Aug 2025 12:28:38 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() In-Reply-To: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Fri, 15 Aug 2025 11:58:48 GMT, Artem Semenov wrote: > The defect has been detected and confirmed in the function ArchDesc::declareClasses() located in the file src/hotspot/share/adlc/output_h.cpp with static code analysis. This defect can potentially lead to a null pointer dereference. > > The pointer instr->_matrule is dereferenced in line 1952 without checking for nullptr, although earlier in line 1858 the same pointer is checked for nullptr, which indicates that it can be null. > > According to [this](https://github.com/openjdk/jdk/pull/26002#issuecomment-3023050372) comment, this PR contains fixes for similar cases in other places. Some alignment nits where you have added additional condition clauses. Some of these are difficult to evaluate in isolation and will need review from the specific component areas. src/hotspot/share/adlc/output_h.cpp line 1952: > 1950: }*/ > 1951: else if( instr->is_ideal_copy() && > 1952: (instr->_matrule != nullptr && instr->_matrule->_rChild != nullptr) && Suggestion: (instr->_matrule != nullptr && instr->_matrule->_rChild != nullptr) && src/hotspot/share/c1/c1_LinearScan.cpp line 4422: > 4420: > 4421: if ((cur != nullptr) && > 4422: (cur->from() < split_pos)) { Suggestion: (cur->from() < split_pos)) { src/hotspot/share/nmt/mallocSiteTable.cpp line 172: > 170: index < pos_idx && head != nullptr; > 171: index++, head = ((MallocSiteHashtableEntry*)head->next() == nullptr) ? head : > 172: (MallocSiteHashtableEntry*)head->next()) {} This doesn't look right to me. We check `head != nullptr` in the loop condition so we cannot reach the assignment if it is null. src/hotspot/share/opto/vectorIntrinsics.cpp line 1319: > 1317: log_if_needed(" ** not supported: arity=%d op=%s vlen=%d etype=%s atype=%s ismask=no", > 1318: is_scatter, is_scatter ? "scatter" : "gather", > 1319: num_elem, type2name(elem_bt), type2name(arr_type->elem()->array_element_basic_type())); There is a bug here but I'm not sure it is what you think it is. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26798#pullrequestreview-3136325292 PR Review Comment: https://git.openjdk.org/jdk/pull/26798#discussion_r2287976814 PR Review Comment: https://git.openjdk.org/jdk/pull/26798#discussion_r2287984002 PR Review Comment: https://git.openjdk.org/jdk/pull/26798#discussion_r2287993050 PR Review Comment: https://git.openjdk.org/jdk/pull/26798#discussion_r2287996530 From epeter at openjdk.org Wed Aug 20 12:31:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Aug 2025 12:31:11 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v18] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > TODO work that arose during review process / recent merges with master: > > - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. > - Test failure with multiversioning: `/home/empeter/Documents/oracle/jtreg/bin/jtreg -va -s -jdk:/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64-debug/jdk -javaoptions:"-Djdk.test.lib.random.seed=-9045761078153722515" -J-Djavatest.maxOutputSize=10000000 /home/empeter/Documents/oracle/jdk-fork6/open/test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java` > - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. > > --------------- > > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: disable flag if not possible ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/f84ec341..8480d814 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=16-17 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Wed Aug 20 12:31:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Aug 2025 12:31:11 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v10] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <8oydcWWCxrLGTk74NqbUS5X97E6g-ZkU1El70fhClf4=.92d3f267-3e86-45b3-94b4-4020d05d5c7c@github.com> Message-ID: On Tue, 19 Aug 2025 16:02:48 GMT, Vladimir Kozlov wrote: >>> Do you think it is worth it to benchmark now, or should be just rely on @robcasloz 's occasional benchmarking and address the issues if they come up? >> >> I am fine with using Roberto's benchmarking later. Just keep eye on it. > >> @vnkozlov I ran some more benchmarks: > > Thank you for running benchmarks. Which one you check first for aliasing code: multiversioning or predicates? > > From this experiments I think the best sequence would be (when both predicates and multiversioning are enabled): > - use predicates for aliasing (fast compilation, small code) > - if it is deoptimized recompile with multiversioning > > Is this how it works now? @vnkozlov I now automatically disable the flag if the others are both off. I've also investigated the performance issue with the aliasing case that uses multiversioning. And I so far could not figure out the 10% performance regression, see detailed analysis attempt https://github.com/openjdk/jdk/pull/24278#issuecomment-3201092650 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3206104402 From epeter at openjdk.org Wed Aug 20 12:31:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Aug 2025 12:31:11 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v11] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <5v60TdDVOqPMSGCSYH1G_y8FFARfBnsfrVWlgCkWSSY=.f76182c0-cad8-4660-8dca-fcf5b9127135@github.com> Message-ID: On Tue, 19 Aug 2025 15:07:01 GMT, Vladimir Kozlov wrote: >> If I'm doing something, then probably automatically disable `UseAutoVectorizationSpeculativeAliasingChecks`. > > Yes, you can do that in `CompilerConfig::ergo_initialize()` as we do for other compiler's flags. That is what I am asking for. I don't think you need to do that in `check_args_consistency()` because flags don't conflict. Ok, I disable the flag as you suggested. [empeter at emanuel bin]$ ./java -XX:-LoopMultiversioning -XX:-UseAutoVectorizationPredicate --version Java HotSpot(TM) 64-Bit Server VM warning: Disabling UseAutoVectorizationSpeculativeAliasingChecks, because neither of the following is enabled: LoopMultiversioning UseAutoVectorizationPredicate java 26-internal 2026-03-17 Java(TM) SE Runtime Environment (fastdebug build 26-internal-2025-08-19-0807278.empeter...) Java HotSpot(TM) 64-Bit Server VM (fastdebug build 26-internal-2025-08-19-0807278.empeter..., mixed mode) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2288001195 From dholmes at openjdk.org Wed Aug 20 12:32:42 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 20 Aug 2025 12:32:42 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() In-Reply-To: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Fri, 15 Aug 2025 11:58:48 GMT, Artem Semenov wrote: > The defect has been detected and confirmed in the function ArchDesc::declareClasses() located in the file src/hotspot/share/adlc/output_h.cpp with static code analysis. This defect can potentially lead to a null pointer dereference. > > The pointer instr->_matrule is dereferenced in line 1952 without checking for nullptr, although earlier in line 1858 the same pointer is checked for nullptr, which indicates that it can be null. > > According to [this](https://github.com/openjdk/jdk/pull/26002#issuecomment-3023050372) comment, this PR contains fixes for similar cases in other places. I've added some additional mailing lists to ensure better coverage here. Also I think you need to update the JBS (and PR) title to reflect the broader scope of the changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26798#issuecomment-3206112684 From epeter at openjdk.org Wed Aug 20 12:55:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Aug 2025 12:55:39 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 16:34:52 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! src/hotspot/share/opto/superword.cpp line 2576: > 2574: > 2575: // Vector nodes and casts should not truncate. > 2576: if (type->isa_vect() != nullptr || type->isa_vectmask() != nullptr || in->is_Reduction() || in->is_ConstraintCast()) { Why should we not truncate a CastII? What can go wrong? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2288072033 From epeter at openjdk.org Wed Aug 20 13:05:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Aug 2025 13:05:41 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV In-Reply-To: References: Message-ID: <9zCz8rLDzNQwtZhSzcirzzUwAN6sOmGrzPaMx6ZAlXc=.70335351-7665-4e52-9430-f81b7bd07255@github.com> On Wed, 20 Aug 2025 07:56:19 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. > The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. > > ### Test (fastdebug) > - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java line 43: > 41: * @test > 42: * @requires vm.debug == true & vm.compiler2.enabled & vm.flagless > 43: * @requires (os.arch != "riscv64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*rvv.*")) Generally, it would be prefereable to adjust the IR rules. But I'm not sure if that is preferrable here. So I think that this is the right solution. @chhagedorn This test may fail on other platforms as well that don't have all the required optimizations, such as vectors and others. Should we accept this solution? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26855#discussion_r2288098967 From epeter at openjdk.org Wed Aug 20 13:15:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Aug 2025 13:15:42 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: On Fri, 15 Aug 2025 11:54:59 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 30: > 28: * @modules jdk.incubator.vector > 29: * @library /test/lib / > 30: * @run main/othervm compiler.c2.aarch64.TestFloat16Replicate I would prefer if this test was also run on other platforms, and not just aarch64. There are other platforms that have Float16 backend instructions. test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 56: > 54: > 55: public static void main(String args[]) { > 56: TestFramework.runWithFlags("--add-modules=jdk.incubator.vector", "-XX:-TieredCompilation"); What about a run that runs with TieredCompilation? Would be nice to test other modes as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2288123574 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2288130339 From epeter at openjdk.org Wed Aug 20 13:15:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Aug 2025 13:15:42 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 13:10:05 GMT, Emanuel Peter wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 30: > >> 28: * @modules jdk.incubator.vector >> 29: * @library /test/lib / >> 30: * @run main/othervm compiler.c2.aarch64.TestFloat16Replicate > > I would prefer if this test was also run on other platforms, and not just aarch64. There are other platforms that have Float16 backend instructions. It's fine if the IR rule is only for SVE, but other platforms could at least encounter the code shape and be tested for correctness. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2288132968 From epeter at openjdk.org Wed Aug 20 13:15:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Aug 2025 13:15:44 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4] In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 10:20:29 GMT, Bhavana Kilambi wrote: >> test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 45: >> >>> 43: public class TestFloat16Replicate { >>> 44: private static short[] input; >>> 45: private static short[] output; >> >> This might give things even more chance to vectorize? Not sure, feel free to ignore. >> >> Suggestion: >> >> private static final short[] INPUTE; >> private static final short[] OUTPUT; > > I hope it's ok to not add these changes to the code. The loops are getting vectorized fine and the tests do pass on aarch64 and x86. I will consider this if there's any issue with auto vectorization in the futurre. Thanks I don't think that making the arrays final will make any difference for vectorization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2288127135 From bulasevich at openjdk.org Wed Aug 20 13:16:50 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 20 Aug 2025 13:16:50 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v8] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> <1u3X9cabv-ybTzQXq7TAWha_8_97OBHH5-icvDiBqUk=.aba537c3-36bf-46d2-9d35-d755a29aca97@github.com> Message-ID: <2oDcygh0RS8lLG8LMKfWoUDGnpV2VaTRn0hDWJGDr6g=.4971a7ff-32d7-410e-a60c-587e741c431d@github.com> On Wed, 20 Aug 2025 10:31:45 GMT, Saranya Natarajan wrote: > I don't think we want different limits in BciProfileWidth for AArch64 and x86. Please use `form_address` as necessary. @theRealAph It was decided in that discussion to cap the maximum value of BciProfileWidth (a debug option) at 1000 across all platforms. Even this limit is already excessive, since methods with thousands of ret bytecodes are extremely uncommon. Unless there is another legitimate way to enlarge the MDO, I don?t think AArch64 code should be adjusted just to accommodate this artificial case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3206295005 From kbarrett at openjdk.org Wed Aug 20 13:21:40 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 20 Aug 2025 13:21:40 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v2] In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 08:21:57 GMT, Francesco Andreuzzi wrote: >> - `opto/phasetype.hpp` defines `static const char* phase_names[]` >> - `compiler/compilerEvent.cpp` defines `static GrowableArray* phase_names` >> >> This is not a problem when the two files are compiled as different translation units, but it causes a build failure if any of them is pulled in by a precompiled header: >> >> >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:59:36: error: redefinition of 'phase_names' with a different type: 'GrowableArray *' vs 'const char *[100]' >> 59 | static GrowableArray* phase_names = nullptr; >> | ^ >> /jdk/src/hotspot/share/opto/phasetype.hpp:147:20: note: previous definition is here >> 147 | static const char* phase_names[] = { >> | ^ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:67:39: error: member reference base type 'const char *' is not a structure or union >> 67 | const u4 nof_entries = phase_names->length(); >> | ~~~~~~~~~~~^ ~~~~~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:71:31: error: member reference base type 'const char *' is not a structure or union >> 71 | writer.write(phase_names->at(i)); >> | ~~~~~~~~~~~^ ~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:77:34: error: member reference base type 'const char *' is not a structure or union >> 77 | for (int i = 0; i < phase_names->length(); i++) { >> | ~~~~~~~~~~~^ ~~~~~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:78:35: error: member reference base type 'const char *' is not a structure or union >> 78 | const char* name = phase_names->at(i); >> | ~~~~~~~~~~~^ ~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:91:9: error: comparison of array 'phase_names' equal to a null pointer is always false [-Werror,-Wtautological-pointer-compare] >> 91 | if (phase_names == nullptr) { >> | ^~~~~~~~~~~ ~~~~~~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:92:19: error: array type 'const char *[100]' is not assignable >> 92 | phase_names = new (mtInternal) GrowableArray(100, mtCompiler); >> | ~~~~~~~~~~~ ^ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:103:24: error: member reference base type 'const char *' is not a structure or union >> 103 | index = phase_names->length(); >> | ~~~~~~~~~~~^ ~~~~~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:104:16: error: member reference base type 'const char *' is not a structure or union >> 104 | phase_names->append(use_strdup ? os::strdup(phase_name) : phase_name... > > Francesco Andreuzzi has updated the pull request incrementally with seven additional commits since the last revision: > > - static > - nn > - indent > - review sugg > - revert > - Merge branch 'resolved-default-cctor' into JDK-8365829 > - use copy ctor Changes requested by kbarrett (Reviewer). src/hotspot/share/opto/phasetype.cpp line 2: > 1: /* > 2: * Copyright (c) 2017, 2025, Oracle and/or its affiliates. All rights reserved. This is a new file, so I think "2017," shouldn't be here. src/hotspot/share/opto/phasetype.hpp line 143: > 141: class CompilerPhaseTypeHelper { > 142: public: > 143: static const char* phase_descriptions[]; Make these `static const char* const` ? Both here and for the definitions of course. I don't think there's any reason for the arrays to be mutable. src/hotspot/share/opto/phasetype.hpp line 144: > 142: public: > 143: static const char* phase_descriptions[]; > 144: static const char* phase_names[]; Is there a reason for these to be public rather than private? Also, we "always" prefix data member names with a leading underscore; see Style Guide. ------------- PR Review: https://git.openjdk.org/jdk/pull/26851#pullrequestreview-3136564495 PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2288143012 PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2288148994 PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2288140835 From epeter at openjdk.org Wed Aug 20 13:29:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Aug 2025 13:29:38 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v2] In-Reply-To: <4Jk_phoDCxNS3QzYjbUnlmgpuZmnTI_f9j5_ORDlrOU=.d66384f5-a0cb-4f6a-8ccf-533fd6eca0e3@github.com> References: <4Jk_phoDCxNS3QzYjbUnlmgpuZmnTI_f9j5_ORDlrOU=.d66384f5-a0cb-4f6a-8ccf-533fd6eca0e3@github.com> Message-ID: On Mon, 18 Aug 2025 08:12:02 GMT, Manuel H?ssig wrote: >> This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. >> >> Testing: >> - [x] Github Actions >> - [x] tier1,tier2 plus some internal testing on Oracle supported platforms > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Apply Beno?t's suggestion > > Co-authored-by: Beno?t Maillard test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 75: > 73: Set.of("-XX:+UseNewCode", "-XX:-UseNewCode")); > 74: t3.start(); > 75: Asserts.fail("Should have thrown exception"); Can I also do a Power-Set? We could do that with an empty string or null. Can an entry also be multiple flags? Sometimes you need them in pairs. Would be nice to have tests for that ;) test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 77: > 75: Asserts.fail("Should have thrown exception"); > 76: } catch (TestRunException e) { > 77: if (!e.getMessage().contains("The following scenarios have failed: #0, #1, #2, #3")) { What if the string continued with `, #4`? Can we ensure this does not happen? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2288166282 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2288172600 From snatarajan at openjdk.org Wed Aug 20 13:35:35 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 20 Aug 2025 13:35:35 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v10] In-Reply-To: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: > **Issue** > Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. > > **Analysis** > On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. > > **Proposal** > Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. > > **Issue in AArch64** > Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. > > **Question to reviewers** > Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? > > **Request to port maintainers** > @dafedafe suggested that we keep the upper bound of `BciProfileWidth` to 1000 pro... Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: - addressing review : nit comments - review : change to test case to test values outside the range ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26139/files - new: https://git.openjdk.org/jdk/pull/26139/files/4cd5f6c3..4f6ad175 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26139&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26139&range=08-09 Stats: 28 lines in 3 files changed: 19 ins; 3 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/26139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26139/head:pull/26139 PR: https://git.openjdk.org/jdk/pull/26139 From snatarajan at openjdk.org Wed Aug 20 13:35:35 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 20 Aug 2025 13:35:35 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v6] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> <1QbX5WHkEdjP-unAFJ1vYaoIc9bV8zz8dA-vKZCkYn8=.8e3704ae-9490-4471-9e5c-dae44004d46f@github.com> Message-ID: On Wed, 13 Aug 2025 09:45:21 GMT, Saranya Natarajan wrote: >> Shouldn't we check that the vm doesn't crash with `BciProfileWidth=-1` and `BciProfileWidth=100000` (or another very high value)? > > @dafedafe : I am working on this and will upload the changes soon. I have uploaded a test that checks two values of `BciProfileWidth` that outside the range ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26139#discussion_r2288185425 From bkilambi at openjdk.org Wed Aug 20 13:42:40 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 20 Aug 2025 13:42:40 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 13:11:56 GMT, Emanuel Peter wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 56: > >> 54: >> 55: public static void main(String args[]) { >> 56: TestFramework.runWithFlags("--add-modules=jdk.incubator.vector", "-XX:-TieredCompilation"); > > What about a run that runs with TieredCompilation? Would be nice to test other modes as well. Hi, thanks for your review. I tried but it doesn't trigger c2 compilation (maybe I need to increase `Warmup`?). As my main motivation was to test the generated backend mach nodes from c2 compilation, I run it specifically with `-XX:-TieredCompilation.` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2288208385 From bkilambi at openjdk.org Wed Aug 20 13:47:39 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 20 Aug 2025 13:47:39 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: <3m4I-aY-PTZsQa_SjoRayIbE2FC15xafQi3C8D9XqZs=.60c17714-5ec4-4a3e-96d6-687d81f3b275@github.com> On Wed, 20 Aug 2025 13:12:45 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 30: >> >>> 28: * @modules jdk.incubator.vector >>> 29: * @library /test/lib / >>> 30: * @run main/othervm compiler.c2.aarch64.TestFloat16Replicate >> >> I would prefer if this test was also run on other platforms, and not just aarch64. There are other platforms that have Float16 backend instructions. > > It's fine if the IR rule is only for SVE, but other platforms could at least encounter the code shape and be tested for correctness. This type of pattern/code shape where one of the inputs is a constant is already being tested in https://github.com/openjdk/jdk/blob/e912977a6687917ed45520c4d8558ebe630e3f52/test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java#L335 I have created this one specifically for aarch64 to ensure both the backend mach nodes are correctly being generated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2288224592 From epeter at openjdk.org Wed Aug 20 13:51:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Aug 2025 13:51:42 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: <04_IpSYiBu9iLViEV2V5opYFqN7OzNewgUEOLSs_Cwc=.a8c693cd-900d-4602-9b88-76dd55f9a844@github.com> On Wed, 20 Aug 2025 13:39:55 GMT, Bhavana Kilambi wrote: >> test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 56: >> >>> 54: >>> 55: public static void main(String args[]) { >>> 56: TestFramework.runWithFlags("--add-modules=jdk.incubator.vector", "-XX:-TieredCompilation"); >> >> What about a run that runs with TieredCompilation? Would be nice to test other modes as well. > > Hi, thanks for your review. I tried but it doesn't trigger c2 compilation (maybe I need to increase `Warmup`?). As my main motivation was to test the generated backend mach nodes from c2 compilation, I run it specifically with `-XX:-TieredCompilation.` Strange, because the IR/Test framework always triggers C2 compilation... How exactly did it fail to compile with C2? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2288237330 From stefank at openjdk.org Wed Aug 20 14:08:44 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 20 Aug 2025 14:08:44 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v2] In-Reply-To: References: Message-ID: <9S82TxAcAKMLy1UjvRIXlSBGLVY5FV3jelaKHUYlkXQ=.40c47c60-d784-469a-b747-911333cd9f72@github.com> On Wed, 20 Aug 2025 13:16:07 GMT, Kim Barrett wrote: >> Francesco Andreuzzi has updated the pull request incrementally with seven additional commits since the last revision: >> >> - static >> - nn >> - indent >> - review sugg >> - revert >> - Merge branch 'resolved-default-cctor' into JDK-8365829 >> - use copy ctor > > src/hotspot/share/opto/phasetype.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2017, 2025, Oracle and/or its affiliates. All rights reserved. > > This is a new file, so I think "2017," shouldn't be here. The guidelines that we follow is that if you copy code from another file to a new file you should use the old copyright years. So, using `2017, 2025,` seems correct to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2288293870 From stefank at openjdk.org Wed Aug 20 14:08:44 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 20 Aug 2025 14:08:44 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v2] In-Reply-To: <9S82TxAcAKMLy1UjvRIXlSBGLVY5FV3jelaKHUYlkXQ=.40c47c60-d784-469a-b747-911333cd9f72@github.com> References: <9S82TxAcAKMLy1UjvRIXlSBGLVY5FV3jelaKHUYlkXQ=.40c47c60-d784-469a-b747-911333cd9f72@github.com> Message-ID: <5N-VzF9eV_7pHB3NkfPkquPtPcz6wFBiHxEYvnPF0Zk=.af595b98-3577-4117-8b1d-7938cdb1bf61@github.com> On Wed, 20 Aug 2025 14:03:52 GMT, Stefan Karlsson wrote: >> src/hotspot/share/opto/phasetype.cpp line 2: >> >>> 1: /* >>> 2: * Copyright (c) 2017, 2025, Oracle and/or its affiliates. All rights reserved. >> >> This is a new file, so I think "2017," shouldn't be here. > > The guidelines that we follow is that if you copy code from another file to a new file you should use the old copyright years. So, using `2017, 2025,` seems correct to me. https://openjdk.org/guide/#copyright-headers > If you move code from an existing file to a new file, bring the entire copyright + license header over to the new file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2288300728 From bkilambi at openjdk.org Wed Aug 20 14:10:42 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 20 Aug 2025 14:10:42 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: <04_IpSYiBu9iLViEV2V5opYFqN7OzNewgUEOLSs_Cwc=.a8c693cd-900d-4602-9b88-76dd55f9a844@github.com> References: <04_IpSYiBu9iLViEV2V5opYFqN7OzNewgUEOLSs_Cwc=.a8c693cd-900d-4602-9b88-76dd55f9a844@github.com> Message-ID: On Wed, 20 Aug 2025 13:49:13 GMT, Emanuel Peter wrote: >> Hi, thanks for your review. I tried but it doesn't trigger c2 compilation (maybe I need to increase `Warmup`?). As my main motivation was to test the generated backend mach nodes from c2 compilation, I run it specifically with `-XX:-TieredCompilation.` > > Strange, because the IR/Test framework always triggers C2 compilation... How exactly did it fail to compile with C2? It fails to match the IR nodes. I think it happened when I used a smaller `Warmup`. With the `Warmup` I am using, it seems to be working fine. I will add that case as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2288309378 From bkilambi at openjdk.org Wed Aug 20 14:13:46 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 20 Aug 2025 14:13:46 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: <3m4I-aY-PTZsQa_SjoRayIbE2FC15xafQi3C8D9XqZs=.60c17714-5ec4-4a3e-96d6-687d81f3b275@github.com> References: <3m4I-aY-PTZsQa_SjoRayIbE2FC15xafQi3C8D9XqZs=.60c17714-5ec4-4a3e-96d6-687d81f3b275@github.com> Message-ID: On Wed, 20 Aug 2025 13:44:52 GMT, Bhavana Kilambi wrote: >> It's fine if the IR rule is only for SVE, but other platforms could at least encounter the code shape and be tested for correctness. > > This type of pattern/code shape where one of the inputs is a constant is already being tested in https://github.com/openjdk/jdk/blob/e912977a6687917ed45520c4d8558ebe630e3f52/test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java#L335 > > I have created this one specifically for aarch64 to ensure both the backend mach nodes are correctly being generated. I can test this on x86 but do you think this test is required to be placed out of `aarch64` folder and make it available for all architectures when the same pattern is already being tested in the above testcase for all architectures? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2288318608 From adinn at openjdk.org Wed Aug 20 14:22:37 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 20 Aug 2025 14:22:37 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() In-Reply-To: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Fri, 15 Aug 2025 11:58:48 GMT, Artem Semenov wrote: > The defect has been detected and confirmed in the function ArchDesc::declareClasses() located in the file src/hotspot/share/adlc/output_h.cpp with static code analysis. This defect can potentially lead to a null pointer dereference. > > The pointer instr->_matrule is dereferenced in line 1952 without checking for nullptr, although earlier in line 1858 the same pointer is checked for nullptr, which indicates that it can be null. > > According to [this](https://github.com/openjdk/jdk/pull/26002#issuecomment-3023050372) comment, this PR contains fixes for similar cases in other places. I'm not clear that the original issue is necessarily a bug that needs fixing with a skip to the next else if case. The justification for adding `instr->_matrule != nullptr && instr->_matrule->_rChild != nullptr` to the if branch test is that earlier code allows for the possibility that `instr->_matrule` might be null. However, that check is performed in an unconditional context for any value of `instr` whereas this specific else branch limits the circumstance to the case where `instr->is_ideal_copy()` is found to be true. So, the prior test offers no guarantee that in this restricted case a null pointer should or should not be possible. The original design may assume that a successful test for `instr->is_ideal_copy()` ought to guarantee that both `instr->_matrule` and `instr->_matrule->_rChild` are non-null. That cannot be determined by the evidence offered. It can only be determined by looking at how instr is constructed. So, rather than just skip to the next case we might need to handle this with an assert and fix whatever code is producing an ideal copy with null fields. Given the level of analysis offered for this case I am suspicious as to whether the other cases tacked onto this issue ought to be accepted at face value without some justification as to why a null check and skip to the next case is correct. I'm also wondering how and why all these cases and associated amendments were arrived at? Is this perhaps based on output from a code analysis tool (perhaps even a genAI tool). If so then I think 1) we ought to be told and 2) we ought to treat its recommendations with a very healthy dose of skepticism. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26798#issuecomment-3206613521 From duke at openjdk.org Wed Aug 20 14:27:25 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Wed, 20 Aug 2025 14:27:25 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v3] In-Reply-To: References: Message-ID: <4qbeYQzhcpER7658NxoX92cJhGRn-Z-bK74MV5X6zt0=.0777c117-5566-4732-9b02-1c026b963162@github.com> > - `opto/phasetype.hpp` defines `static const char* phase_names[]` > - `compiler/compilerEvent.cpp` defines `static GrowableArray* phase_names` > > This is not a problem when the two files are compiled as different translation units, but it causes a build failure if any of them is pulled in by a precompiled header: > > > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:59:36: error: redefinition of 'phase_names' with a different type: 'GrowableArray *' vs 'const char *[100]' > 59 | static GrowableArray* phase_names = nullptr; > | ^ > /jdk/src/hotspot/share/opto/phasetype.hpp:147:20: note: previous definition is here > 147 | static const char* phase_names[] = { > | ^ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:67:39: error: member reference base type 'const char *' is not a structure or union > 67 | const u4 nof_entries = phase_names->length(); > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:71:31: error: member reference base type 'const char *' is not a structure or union > 71 | writer.write(phase_names->at(i)); > | ~~~~~~~~~~~^ ~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:77:34: error: member reference base type 'const char *' is not a structure or union > 77 | for (int i = 0; i < phase_names->length(); i++) { > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:78:35: error: member reference base type 'const char *' is not a structure or union > 78 | const char* name = phase_names->at(i); > | ~~~~~~~~~~~^ ~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:91:9: error: comparison of array 'phase_names' equal to a null pointer is always false [-Werror,-Wtautological-pointer-compare] > 91 | if (phase_names == nullptr) { > | ^~~~~~~~~~~ ~~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:92:19: error: array type 'const char *[100]' is not assignable > 92 | phase_names = new (mtInternal) GrowableArray(100, mtCompiler); > | ~~~~~~~~~~~ ^ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:103:24: error: member reference base type 'const char *' is not a structure or union > 103 | index = phase_names->length(); > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:104:16: error: member reference base type 'const char *' is not a structure or union > 104 | phase_names->append(use_strdup ? os::strdup(phase_name) : phase_name); > | ~~~~~~~~~~~^ ~~~~~~ > 9 errors generated. > > > Passes `tier1`. Francesco Andreuzzi has updated the pull request incrementally with two additional commits since the last revision: - 2012 - const ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26851/files - new: https://git.openjdk.org/jdk/pull/26851/files/a2183a2c..4db623ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26851&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26851&range=01-02 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/26851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26851/head:pull/26851 PR: https://git.openjdk.org/jdk/pull/26851 From duke at openjdk.org Wed Aug 20 14:27:25 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Wed, 20 Aug 2025 14:27:25 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v2] In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 13:18:17 GMT, Kim Barrett wrote: >> Francesco Andreuzzi has updated the pull request incrementally with seven additional commits since the last revision: >> >> - static >> - nn >> - indent >> - review sugg >> - revert >> - Merge branch 'resolved-default-cctor' into JDK-8365829 >> - use copy ctor > > src/hotspot/share/opto/phasetype.hpp line 143: > >> 141: class CompilerPhaseTypeHelper { >> 142: public: >> 143: static const char* phase_descriptions[]; > > Make these `static const char* const` ? Both here and for the definitions of course. I don't > think there's any reason for the arrays to be mutable. Sure: a66abc051fe6f640930cbcca9ab8240b6ca97aeb ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2288355536 From duke at openjdk.org Wed Aug 20 14:30:52 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Wed, 20 Aug 2025 14:30:52 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v2] In-Reply-To: <5N-VzF9eV_7pHB3NkfPkquPtPcz6wFBiHxEYvnPF0Zk=.af595b98-3577-4117-8b1d-7938cdb1bf61@github.com> References: <9S82TxAcAKMLy1UjvRIXlSBGLVY5FV3jelaKHUYlkXQ=.40c47c60-d784-469a-b747-911333cd9f72@github.com> <5N-VzF9eV_7pHB3NkfPkquPtPcz6wFBiHxEYvnPF0Zk=.af595b98-3577-4117-8b1d-7938cdb1bf61@github.com> Message-ID: <86V0TiHbVK6OmC59Ac4up-PZx-pbR3wW627MJUMu_O4=.c743594b-a4c8-472f-a86b-90d6bb1cc092@github.com> On Wed, 20 Aug 2025 14:05:34 GMT, Stefan Karlsson wrote: >> The guidelines that we follow is that if you copy code from another file to a new file you should use the old copyright years. So, using `2017, 2025,` seems correct to me. > > https://openjdk.org/guide/#copyright-headers > >> If you move code from an existing file to a new file, bring the entire copyright + license header over to the new file. Thanks, the start year for `phasetype.hpp` is 2012, so I fixed it in `.cpp`: 4db623eae0e6633889d1c9d571a06489258e39f5 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2288366668 From duke at openjdk.org Wed Aug 20 14:30:55 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Wed, 20 Aug 2025 14:30:55 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v2] In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 13:15:23 GMT, Kim Barrett wrote: >> Francesco Andreuzzi has updated the pull request incrementally with seven additional commits since the last revision: >> >> - static >> - nn >> - indent >> - review sugg >> - revert >> - Merge branch 'resolved-default-cctor' into JDK-8365829 >> - use copy ctor > > src/hotspot/share/opto/phasetype.hpp line 144: > >> 142: public: >> 143: static const char* phase_descriptions[]; >> 144: static const char* phase_names[]; > > Is there a reason for these to be public rather than private? > Also, we "always" prefix data member names with a leading underscore; see Style Guide. `phase_name` is accessed by `static CompilerPhaseType find_phase`, should we make it a `friend` of the `CompilerPhaseTypeHelper`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2288363887 From duke at openjdk.org Wed Aug 20 14:48:55 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Wed, 20 Aug 2025 14:48:55 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v4] In-Reply-To: References: Message-ID: > - `opto/phasetype.hpp` defines `static const char* phase_names[]` > - `compiler/compilerEvent.cpp` defines `static GrowableArray* phase_names` > > This is not a problem when the two files are compiled as different translation units, but it causes a build failure if any of them is pulled in by a precompiled header: > > > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:59:36: error: redefinition of 'phase_names' with a different type: 'GrowableArray *' vs 'const char *[100]' > 59 | static GrowableArray* phase_names = nullptr; > | ^ > /jdk/src/hotspot/share/opto/phasetype.hpp:147:20: note: previous definition is here > 147 | static const char* phase_names[] = { > | ^ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:67:39: error: member reference base type 'const char *' is not a structure or union > 67 | const u4 nof_entries = phase_names->length(); > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:71:31: error: member reference base type 'const char *' is not a structure or union > 71 | writer.write(phase_names->at(i)); > | ~~~~~~~~~~~^ ~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:77:34: error: member reference base type 'const char *' is not a structure or union > 77 | for (int i = 0; i < phase_names->length(); i++) { > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:78:35: error: member reference base type 'const char *' is not a structure or union > 78 | const char* name = phase_names->at(i); > | ~~~~~~~~~~~^ ~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:91:9: error: comparison of array 'phase_names' equal to a null pointer is always false [-Werror,-Wtautological-pointer-compare] > 91 | if (phase_names == nullptr) { > | ^~~~~~~~~~~ ~~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:92:19: error: array type 'const char *[100]' is not assignable > 92 | phase_names = new (mtInternal) GrowableArray(100, mtCompiler); > | ~~~~~~~~~~~ ^ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:103:24: error: member reference base type 'const char *' is not a structure or union > 103 | index = phase_names->length(); > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:104:16: error: member reference base type 'const char *' is not a structure or union > 104 | phase_names->append(use_strdup ? os::strdup(phase_name) : phase_name); > | ~~~~~~~~~~~^ ~~~~~~ > 9 errors generated. > > > Passes `tier1`. Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: underscore ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26851/files - new: https://git.openjdk.org/jdk/pull/26851/files/4db623ea..90e9c537 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26851&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26851&range=02-03 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/26851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26851/head:pull/26851 PR: https://git.openjdk.org/jdk/pull/26851 From duke at openjdk.org Wed Aug 20 14:48:56 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Wed, 20 Aug 2025 14:48:56 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v2] In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 14:26:53 GMT, Francesco Andreuzzi wrote: >> src/hotspot/share/opto/phasetype.hpp line 144: >> >>> 142: public: >>> 143: static const char* phase_descriptions[]; >>> 144: static const char* phase_names[]; >> >> Is there a reason for these to be public rather than private? >> Also, we "always" prefix data member names with a leading underscore; see Style Guide. > > `phase_name` is accessed by `static CompilerPhaseType find_phase`, should we make it a `friend` of the `CompilerPhaseTypeHelper`? Underscore: 90e9c537709ad4c384f7efd2ed18c63a4c21b51b ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2288423428 From kvn at openjdk.org Wed Aug 20 15:08:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 20 Aug 2025 15:08:01 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v10] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <8oydcWWCxrLGTk74NqbUS5X97E6g-ZkU1El70fhClf4=.92d3f267-3e86-45b3-94b4-4020d05d5c7c@github.com> Message-ID: On Wed, 20 Aug 2025 12:26:52 GMT, Emanuel Peter wrote: > I've also investigated the performance issue with the aliasing case that uses multiversioning. And I so far could not figure out the 10% performance regression, see detailed analysis attempt https://github.com/openjdk/jdk/pull/24278#issuecomment-3201092650 Is it possible it always go into slow path? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3206801026 From kvn at openjdk.org Wed Aug 20 15:08:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 20 Aug 2025 15:08:01 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v10] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> <8oydcWWCxrLGTk74NqbUS5X97E6g-ZkU1El70fhClf4=.92d3f267-3e86-45b3-94b4-4020d05d5c7c@github.com> Message-ID: On Tue, 19 Aug 2025 16:02:48 GMT, Vladimir Kozlov wrote: >>> Do you think it is worth it to benchmark now, or should be just rely on @robcasloz 's occasional benchmarking and address the issues if they come up? >> >> I am fine with using Roberto's benchmarking later. Just keep eye on it. > >> @vnkozlov I ran some more benchmarks: > > Thank you for running benchmarks. Which one you check first for aliasing code: multiversioning or predicates? > > From this experiments I think the best sequence would be (when both predicates and multiversioning are enabled): > - use predicates for aliasing (fast compilation, small code) > - if it is deoptimized recompile with multiversioning > > Is this how it works now? > @vnkozlov I now automatically disable the flag if the others are both off. Good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3206791820 From mdoerr at openjdk.org Wed Aug 20 15:32:47 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 20 Aug 2025 15:32:47 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v10] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> <-SMEjFj6yBqm_pNlCxGL9l5sBegIXLyz28nPhdcW-6U=.06a121a1-ce40-4bc0-89c4-bb1dbe90c489@github.com> Message-ID: <60sgN5hitLgZGw3LHfU7GoQ9NWQA31D1Q_RUjjMiOqI=.1110e1b5-8560-4cd8-a695-5b3aa43af4c9@github.com> On Wed, 20 Aug 2025 12:17:26 GMT, Martin Doerr wrote: >> Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: >> >> - addressing review : nit comments >> - review : change to test case to test values outside the range > > test/lib-test/jdk/test/whitebox/vm_flags/IntxTest.java line 42: > >> 40: private static final long COMPILE_THRESHOLD = VmFlagTest.WHITE_BOX.getIntxVMFlag("CompileThreshold"); >> 41: private static final Long[] TESTS = {0L, 100L, (long)(Integer.MAX_VALUE>>3)*100L}; >> 42: > > This empty line should not be removed. Thanks! But, the indentation of the next line looks odd, now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26139#discussion_r2288545780 From galder at openjdk.org Wed Aug 20 15:39:46 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 20 Aug 2025 15:39:46 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: <59dW-P8qExfEfXqud1rOPax4qGcubqi9RQxM4tJLQoQ=.dd1a3fb3-8ded-4e2d-bc25-49456e7ab46f@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> <59dW-P8qExfEfXqud1rOPax4qGcubqi9RQxM4tJLQoQ=.dd1a3fb3-8ded-4e2d-bc25-49456e7ab46f@github.com> Message-ID: <_zWcUqG-geMU1r9bYHnTjT1KtvchzWhA4UHMtEvWljU=.590aa550-3306-42aa-91e2-b7360d2e2076@github.com> On Wed, 20 Aug 2025 06:49:41 GMT, Emanuel Peter wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Check at the very least that auto vectorization is supported > > test/micro/org/openjdk/bench/vm/compiler/VectorBitConversion.java line 90: > >> 88: >> 89: @Benchmark >> 90: public long[] doubleToLongBits() { > > I wonder if we should not just extend this benchmark, that has `convertI2F` etc: > `test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java` > > Just a suggestion, we can also keep them separately. Maybe one day we should clean up the benchmarks, and put them all in some `autovectorization` subdirectory, and organize the files and benchmarks a little better. I'll look into `TypeVectorOperations` and see what can be done there. I had missed it when I wrote the benchmark. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2288560996 From mhaessig at openjdk.org Wed Aug 20 16:34:39 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 20 Aug 2025 16:34:39 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v2] In-Reply-To: References: <4Jk_phoDCxNS3QzYjbUnlmgpuZmnTI_f9j5_ORDlrOU=.d66384f5-a0cb-4f6a-8ccf-533fd6eca0e3@github.com> Message-ID: <4b41P1TSGZZxpUbBY19DPltef8fpN7osxt6BNA4X3mk=.e115de46-d287-4fc5-9a00-aec2645dd256@github.com> On Wed, 20 Aug 2025 13:24:59 GMT, Emanuel Peter wrote: > Can I also do a Power-Set? No, this is a PR for a cross product ;) Also, I have not seen the need for a power set of arguments in any IR-test so far. Have you? > Can an entry also be multiple flags? Sometimes you need them in pairs. My testing suggests that a String can contain multiple arguments. But I'll have to look into it further. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2288705971 From mhaessig at openjdk.org Wed Aug 20 17:01:13 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 20 Aug 2025 17:01:13 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v2] In-Reply-To: <4Jk_phoDCxNS3QzYjbUnlmgpuZmnTI_f9j5_ORDlrOU=.d66384f5-a0cb-4f6a-8ccf-533fd6eca0e3@github.com> References: <4Jk_phoDCxNS3QzYjbUnlmgpuZmnTI_f9j5_ORDlrOU=.d66384f5-a0cb-4f6a-8ccf-533fd6eca0e3@github.com> Message-ID: On Mon, 18 Aug 2025 08:12:02 GMT, Manuel H?ssig wrote: >> This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. >> >> Testing: >> - [x] Github Actions >> - [x] tier1,tier2 plus some internal testing on Oracle supported platforms > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Apply Beno?t's suggestion > > Co-authored-by: Beno?t Maillard Thank you for looking at this @eme64. I made the testing a bit more robust and added a case for a pair of arguments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26762#issuecomment-3207275001 From mhaessig at openjdk.org Wed Aug 20 17:01:13 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 20 Aug 2025 17:01:13 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v3] In-Reply-To: References: Message-ID: > This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus some internal testing on Oracle supported platforms Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8365262' of github.com:mhaessig/jdk into JDK-8365262 - Better testing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26762/files - new: https://git.openjdk.org/jdk/pull/26762/files/0bd8c6a7..9cf0f2a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=01-02 Stats: 21 lines in 1 file changed: 19 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26762.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26762/head:pull/26762 PR: https://git.openjdk.org/jdk/pull/26762 From mhaessig at openjdk.org Wed Aug 20 17:03:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 20 Aug 2025 17:03:56 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v4] In-Reply-To: References: Message-ID: <7QzCmv17rsIwVX0a4C_wTq4jhx6cob4juy454yuOof0=.fa045ee0-eaa4-43e1-853b-93880a0d44b3@github.com> > This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus some internal testing on Oracle supported platforms Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Make the test work ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26762/files - new: https://git.openjdk.org/jdk/pull/26762/files/9cf0f2a4..f59e9d9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=02-03 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26762.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26762/head:pull/26762 PR: https://git.openjdk.org/jdk/pull/26762 From duke at openjdk.org Wed Aug 20 17:16:58 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 20 Aug 2025 17:16:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v7] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 22:31:28 GMT, Erik ?sterlund wrote: >>> Hi @fisk, >>> >>> Thank you for the very valuable comment. It has point we have not thought about. >>> >>> > I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles. >>> >>> It's not clear to me what you mean "special nmethods". IMO we don't introduce any special nmethods. From my point of view, a normal nmethod is an nmethod for a ordinary Java method. Nmethods for non-ordinary Java methods are special, e.g. native nmethods or method handle linkers(JDK-8263377). I think normal nmethods should be relocatable within CodeCache. >> >> I mean nmethods with a subtly different life cycle where usual invariants/expectations don't hold. Like method handle intrinsics and enter special intrinsics for example. Used to have a different life cycle for OSR nmethods too. >> >>> > You can't just copy oops. >>> >>> Yes, this is the main issue at the moment. Can we do this at a safepoint? >> >> I don't think it solves much. You can't stash away a pointer to the nmethod, roll to a safepoint, and expect the nmethod to not be freed. Even if you did, you still can't copy the oops. >> >> If we are to do this, I think you want to apply nmethod entry barriers first. That stabilizes the oops. >> >>> > I'm worried about copying the nmethod epoch counters >>> >>> We should clear them. If not, it is a bug. >> >> I'd like to change copying from opt-out to opt-in instead; that would make me feel more comfortable. Then perhaps you can share initialization code that sets up the initial state of the nmethod exactly in the same way as normal nmethods. >> >> I didn't check but you need to take the Compile_lock and verify dependencies too if you didn't do that, I think, so you don't race with deoptimization. >> >>> > You don't check if the nmethod is_unloading() when cloning it. >>> >>> Should such nmethods be not entrant? We don't relocate not entrant nmethods. >> >> is_not_entrant doesn't imply is_unloading. >> >>> > What are the consequences of copying the deoptimization generation? >>> >>> What do you mean? >> >> I mean is it safe to racingly copy the deoptmization generation when there is concurrent deoptimization? This is why I'd prefer copying to be opt-in rather than opt-out so we don't have to stare at every single field and wonder what will happen when a new nmethod "inherits" state from a different nmethod in interesting races. I want it to work as much as possible as normal nmethod installation, starting with a state as close as po... > >> @fisk Thank you for the valuable feedback. Here is a more detailed response to the concerns you brought up > > Thanks, it's shaping up. > >> Instead of tracking the nmethod pointer which could become stale I updated the code to use method handles. I believe the method handle should ensure the method remains valid and we can then relocate its corresponding nmethod. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/runtime/vmOperations.cpp#L106-L110) > > The safepoint is still causing more trouble than it solves. It was introduced due to oop phobia. What the oops really needed to stabilize is to run the entry barrier which you do now. The safepoint merely destabilizes the oops again while introducing latency problems and fun class redefinition interactions. It should be removed as I can't see it serves any purpose. > >> The relocated nmethod is added as a dependent nmethod on all of the MethodHandles and InstranceKlass in its dependency scope. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1543-L1564) > > My concern was about something else - a table tracks all the nmethods that have old metadata in order to speed up a walk over the code cache that finds said nmethods. > > This should be dealt with by not relocating nmethods with evol dependencies/metadata and by not safepointing, which could introduce class redefinition which populates this table. > >> The source nmethod entry barrier is now called before copying. I believe this will disarm the barrier and reset the guard value for it to be safe to copy. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1530) > > Yes and fix the oops so you don't need a safepoint. > >> Copying this value was not intentional. It should be correctly set to the default value now. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1441) > > Good. > >> I added this check to ensure the nmethod is not unloading and removed the not entrant check as is unloading implies not entrant. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1583-L1585) > > That's not quite true. There are two separate mechanisms that guard the entry. When sn nmethod becomes invalid due to for example a broken speculative assumpti... @fisk I wanted to follow up on our in-person discussion. As I understood it, your concern is that directly relocating nmethods adds complexity and risk, whereas asking the compiler to recompile the method would be safer. In that approach, a method would either (a) be recompiled immediately, or (b) naturally be placed correctly on its next recompilation. Below are some of my thoughts on the current approach: Timeliness ? relocation guarantees the method is moved as soon as it?s detected as hot, without relying on a future recompilation that may or may not happen. Efficiency ? it avoids discarding an already optimized nmethod and spending cycles recompiling it, which can be expensive for hot methods. Truffle compatibility ? while recompilation would work for C1/C2 functions, I don?t think it is trivial to trigger recompilation of Truffle methods from within the JVM, so relocation may be a more practical option there. Flexibility ? relocation lets us rearrange multiple nmethods in one pass to maintain an optimal code cache layout, something recompilation can?t control. The trade-off is some implementation complexity, but I think the experimental flag provides enough safety and flexibility to improve or remove this feature in the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3207340738 From duke at openjdk.org Wed Aug 20 17:19:44 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Wed, 20 Aug 2025 17:19:44 GMT Subject: Integrated: 8360304: Redundant condition in LibraryCallKit::inline_vector_nary_operation In-Reply-To: References: Message-ID: On Sat, 2 Aug 2025 15:44:22 GMT, Francesco Andreuzzi wrote: > The check for `sopc != 0` is not needed after JDK-8353786, the function would exit at L374 otherwise. > > Passes tier1. This pull request has now been integrated. Changeset: ed7d5fe8 Author: Francesco Andreuzzi Committer: Volker Simonis URL: https://git.openjdk.org/jdk/commit/ed7d5fe840fed853b8a7db3347d6400f142ad154 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8360304: Redundant condition in LibraryCallKit::inline_vector_nary_operation Reviewed-by: shade, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/26606 From duke at openjdk.org Wed Aug 20 18:26:07 2025 From: duke at openjdk.org (Tobias Hotz) Date: Wed, 20 Aug 2025 18:26:07 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v3] In-Reply-To: References: Message-ID: > This PR improves the value of interger division nodes. > Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case > We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. > This also cleans up and unifies the code paths for DivINode and DivLNode. > I've added some tests to validate the optimization. Without the changes, some of these tests fail. Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: Add a simple path for non-special-case corner calculation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26143/files - new: https://git.openjdk.org/jdk/pull/26143/files/8dd1ff1b..eef20ae6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=01-02 Stats: 65 lines in 1 file changed: 32 ins; 18 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/26143.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26143/head:pull/26143 PR: https://git.openjdk.org/jdk/pull/26143 From kbarrett at openjdk.org Wed Aug 20 19:07:40 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 20 Aug 2025 19:07:40 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v2] In-Reply-To: <5N-VzF9eV_7pHB3NkfPkquPtPcz6wFBiHxEYvnPF0Zk=.af595b98-3577-4117-8b1d-7938cdb1bf61@github.com> References: <9S82TxAcAKMLy1UjvRIXlSBGLVY5FV3jelaKHUYlkXQ=.40c47c60-d784-469a-b747-911333cd9f72@github.com> <5N-VzF9eV_7pHB3NkfPkquPtPcz6wFBiHxEYvnPF0Zk=.af595b98-3577-4117-8b1d-7938cdb1bf61@github.com> Message-ID: On Wed, 20 Aug 2025 14:05:34 GMT, Stefan Karlsson wrote: >> The guidelines that we follow is that if you copy code from another file to a new file you should use the old copyright years. So, using `2017, 2025,` seems correct to me. > > https://openjdk.org/guide/#copyright-headers > >> If you move code from an existing file to a new file, bring the entire copyright + license header over to the new file. Thanks @stefank - I'd missed that part about copying from an old file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2289041960 From kbarrett at openjdk.org Wed Aug 20 19:38:36 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 20 Aug 2025 19:38:36 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v2] In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 14:45:53 GMT, Francesco Andreuzzi wrote: >> `phase_name` is accessed by `static CompilerPhaseType find_phase`, should we make it a `friend` of the `CompilerPhaseTypeHelper`? > > Underscore: 90e9c537709ad4c384f7efd2ed18c63a4c21b51b Don't make it a friend, make it a static member function, and fix the one caller (later in this file). (The definition of find_phase could be moved to the new .cpp file.) I think the caller also has an ODR problem, with calls from different including TUs getting a different file-scoped find_phase. It looks like there might be a lot of file-scoped static declarations from header files in our code. I've made a note to look into this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2289113847 From dlong at openjdk.org Wed Aug 20 20:06:43 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 20 Aug 2025 20:06:43 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers [v2] In-Reply-To: <1ZGeH-R9goJByTfkQSiSKp1nD9oxNqOkeG50T5rnJuI=.4cb38ce6-eac2-42fc-ad4d-771758bd4d84@github.com> References: <1ZGeH-R9goJByTfkQSiSKp1nD9oxNqOkeG50T5rnJuI=.4cb38ce6-eac2-42fc-ad4d-771758bd4d84@github.com> Message-ID: On Mon, 18 Aug 2025 09:47:15 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. >> >> Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. >> >> Testing: GHA > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Good catch by Vladimir > - Vladimir's comments src/hotspot/share/code/relocInfo.hpp line 606: > 604: RelocIterator(CodeSection* cb, address begin = nullptr, address limit = nullptr); > 605: RelocIterator(CodeBlob* cb); > 606: RelocIterator(relocInfo& ri); How about making this new ctor private? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26569#discussion_r2289179909 From snatarajan at openjdk.org Wed Aug 20 20:21:08 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 20 Aug 2025 20:21:08 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v11] In-Reply-To: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: > **Issue** > Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. > > **Analysis** > On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. > > **Proposal** > Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. > > **Issue in AArch64** > Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. > > **Question to reviewers** > Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? > > **Request to port maintainers** > @dafedafe suggested that we keep the upper bound of `BciProfileWidth` to 1000 pro... Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: addressing review - fixing indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26139/files - new: https://git.openjdk.org/jdk/pull/26139/files/4f6ad175..6cb9e98b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26139&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26139&range=09-10 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26139/head:pull/26139 PR: https://git.openjdk.org/jdk/pull/26139 From snatarajan at openjdk.org Wed Aug 20 20:23:45 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 20 Aug 2025 20:23:45 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v11] In-Reply-To: <60sgN5hitLgZGw3LHfU7GoQ9NWQA31D1Q_RUjjMiOqI=.1110e1b5-8560-4cd8-a695-5b3aa43af4c9@github.com> References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> <-SMEjFj6yBqm_pNlCxGL9l5sBegIXLyz28nPhdcW-6U=.06a121a1-ce40-4bc0-89c4-bb1dbe90c489@github.com> <60sgN5hitLgZGw3LHfU7GoQ9NWQA31D1Q_RUjjMiOqI=.1110e1b5-8560-4cd8-a695-5b3aa43af4c9@github.com> Message-ID: On Wed, 20 Aug 2025 15:30:19 GMT, Martin Doerr wrote: >> test/lib-test/jdk/test/whitebox/vm_flags/IntxTest.java line 42: >> >>> 40: private static final long COMPILE_THRESHOLD = VmFlagTest.WHITE_BOX.getIntxVMFlag("CompileThreshold"); >>> 41: private static final Long[] TESTS = {0L, 100L, (long)(Integer.MAX_VALUE>>3)*100L}; >>> 42: >> >> This empty line should not be removed. > > Thanks! But, the indentation of the next line looks odd, now. Sorry. I have fixed this now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26139#discussion_r2289211759 From lucy at openjdk.org Wed Aug 20 21:18:42 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 20 Aug 2025 21:18:42 GMT Subject: RFR: 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size [v3] In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 09:42:08 GMT, Amit Kumar wrote: >> There isn't enough initial cache present which can let the interpreter mode run freely. So before even we reach to the compiler phase and try to bail out, in case there isn't enough space left for the stub compilation, JVM crashes. Idea is to increase the Initial cache size and make it enough to run interpreter mode at least. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > adds comment for larger size requirement LGTM. Thanks for the explaining comment. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25741#pullrequestreview-3138267384 From jkarthikeyan at openjdk.org Wed Aug 20 22:20:14 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 20 Aug 2025 22:20:14 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 12:53:12 GMT, Emanuel Peter wrote: >> Hi all, >> This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! > > src/hotspot/share/opto/superword.cpp line 2576: > >> 2574: >> 2575: // Vector nodes and casts should not truncate. >> 2576: if (type->isa_vect() != nullptr || type->isa_vectmask() != nullptr || in->is_Reduction() || in->is_ConstraintCast()) { > > Why should we not truncate a CastII? What can go wrong? My thinking was that since casts specifically change the type of a node, they may not be safe to truncate. In practice it might not matter because after the CastII pack is created, it's discarded because there is no backend implementation for vectorized CastII. I've opted to mark them as non-truncating to stay on the safer side. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2289422909 From dlong at openjdk.org Wed Aug 20 23:42:59 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 20 Aug 2025 23:42:59 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v11] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: On Wed, 20 Aug 2025 20:21:08 GMT, Saranya Natarajan wrote: >> **Issue** >> Extreme values for BciProfileWidth flag such as `java -XX:BciProfileWidth=-1 -version` and `java -XX:BciProfileWidth=100000 -version `results in assert failure `assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. This is observed in a x86 machine. >> >> **Analysis** >> On debugging the issue, I found that increasing the size of the interpreter using the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` prevented the above mentioned assert from failing for large values of BciProfileWidth. >> >> **Proposal** >> Considering the fact that larger BciProfileWidth results in slower profiling, I have proposed a range between 0 to 5000 to restrict the value for BciProfileWidth for x86 machines. This maximum value is based on modifying the `InterpreterCodeSize` variable in `src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp` using the smallest `InterpreterCodeSize` for all the architectures. As for the lower bound, a value of -1 would be the same as 0, as this simply means no return bci's will be recorded in ret profile. >> >> **Issue in AArch64** >> Additionally running the command `java -XX:BciProfileWidth= 10000 -version` (or larger values) results in a different failure `assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32768, 3` on an AArch64 machine.This is an issue of maximum offset for `ldr/str` in AArch64 which can be fixed using `form_address` as mentioned in [JDK-8342736](https://bugs.openjdk.org/browse/JDK-8342736). In my preliminary fix using `form_address` on AArch64 machine. I had to modify 3 `ldr` and 1 `str` instruction (in file `src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp` at line number 926, 983, and 997). With this fix using `form_address`, `BciProfileWidth` works for maximum of 5000 after which it crashes with`assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000772b63a7a3a0 <= 0x0000772b63b75159 <= 0x0000772b63b75158 `. Without this fix `BciProfileWidth` works for a maximum value of 1300. Currently, I have suggested to restrict the upper bound on AArch64 to 1000 instead of fixing it with `form_address`. >> >> **Question to reviewers** >> Do you think this is a reasonable fix ? For AArch64 do you suggest fixing using `form_address` ? If yes, do I fix it under this PR or create another one ? >> >> **Request to port maintainers** >> @dafedafe suggested that we keep the upper boun... > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review - fixing indentation If we wanted to decrease the code size, we could change the unrolled loop to a real loop. But I think first we should answer the question, why are we profiling "ret" instructions at all? As far as I can tell, the compilers are not using the profiling data for anything, so maybe we could just remove it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3208458542 From dlong at openjdk.org Thu Aug 21 00:08:51 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 21 Aug 2025 00:08:51 GMT Subject: RFR: 8286865: vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH/Test.java fails with Out of space in CodeCache [v2] In-Reply-To: References: Message-ID: <9sydfNb2vqZVOCh4mgVVElzsgQNXh8Hye8qr--gOyqs=.039d40f3-a90a-490a-8028-69cc68204c45@github.com> On Tue, 19 Aug 2025 15:02:55 GMT, Ramkumar Sunderbabu wrote: >> MethodHandle invocations with Xcomp are filling up CodeCache quickly in the test, especially in machines with high number of processors. >> It is possible to measure code cache consumption per invocation, estimate overall consumption and bail out before CodeCache runs out of memory. >> But it is much simpler to exclude the test for Xcomp flag. >> >> Additional Change: MethodHandles.lookup was done unnecessarily invoked for all iterations. Replaced it with single invocation. >> >> PS: This issue is not seen in JDK 20 and above, possibly due to JDK-8290025, but the exclusion guards against vagaries of CodeCache management. > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comment -Xcomp is a useful stress flag, and this test is meant to stress MHs, not the code cache, so can we increase the code cache size enough to let it pass with -Xcomp? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26840#issuecomment-3208502041 From dlong at openjdk.org Thu Aug 21 00:25:52 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 21 Aug 2025 00:25:52 GMT Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee In-Reply-To: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> Message-ID: On Wed, 23 Jul 2025 11:14:27 GMT, Damon Fenacci wrote: > # Issue > The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one. > In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan. > > # Cause > The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading. > > For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method. > What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`. > > More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same. > > # Fix > > This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. So, we change the assert to **check for invalid dependencies if the current callee and the previous one don't match**. > > # Testing > > This issue is very very, very intermittent and depending on a number of factors. This ... src/hotspot/share/opto/callGenerator.cpp line 487: > 485: "repeated inline attempt with different callee"); > 486: } > 487: #endif I'm wondering if there might be other reasons that the callee might change, like JVMTI class redefinition. Also, it sounds like the CHA case is really rare, and we check dependencies at the end anyway, so the easiest fix for class redefinition and CHA would be to ignore the new callee and keep the old one here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26441#discussion_r2289579643 From dlong at openjdk.org Thu Aug 21 00:29:55 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 21 Aug 2025 00:29:55 GMT Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v2] In-Reply-To: <5CGrcWjFZ7Zqj_Tm0LO6Tqg9cUA-xxvcaa2J-yWW8BE=.af4dea7c-e39d-491d-b924-c89fa82e757a@github.com> References: <5CGrcWjFZ7Zqj_Tm0LO6Tqg9cUA-xxvcaa2J-yWW8BE=.af4dea7c-e39d-491d-b924-c89fa82e757a@github.com> Message-ID: On Thu, 14 Aug 2025 10:54:08 GMT, Damon Fenacci wrote: >> # Issue >> While compiling `java.util.zip.ZipFile` in C2 this assert is triggered >> https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235 >> >> # Cause >> While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens: >> * we insert a trailing `MemBarStoreStore` in the constructor >> before_folding >> >> * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. >> after_folding >> >> * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302 >> * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235 >> triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier >> >> The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped). >> >> # Fix >> Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation. >> >> # Testing >> Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after. >> Tier 1-3+ tests passed. > > Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8360031 > - JDK-8360031: update assert message > - Merge branch 'master' into JDK-8360031 > - JDK-8360031: remove unnecessary include > - JDK-8360031: remove UseNewCode > - JDK-8360031: compilation asserts in MemBarNode::remove This look OK on the surface, but isn't handling MemBarStoreStore and MemBarRelease differently asking for trouble? Is there a reason why they need to be handled in different passes? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26556#issuecomment-3208536529 From dlong at openjdk.org Thu Aug 21 00:31:55 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 21 Aug 2025 00:31:55 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v3] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 11:57:38 GMT, Daniel Skantz wrote: >> This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). >> >> In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. >> >> The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. >> >> [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 >> >> [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 >> >> Testing: T1-4. >> >> Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. > > Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision: > > - comment > - changes src/hotspot/share/opto/stringopts.cpp line 318: > 316: // -- leading to high memory use, compilation time, and later, a large number of IR nodes > 317: // -- and bail out in that case. > 318: if (STACKED_CONCAT_UPPER_BOUND < arguments_appended) { Is it just me, or is it easier to read with the limit on the right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2289587584 From rsunderbabu at openjdk.org Thu Aug 21 01:18:51 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Thu, 21 Aug 2025 01:18:51 GMT Subject: RFR: 8286865: vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH/Test.java fails with Out of space in CodeCache [v2] In-Reply-To: <9sydfNb2vqZVOCh4mgVVElzsgQNXh8Hye8qr--gOyqs=.039d40f3-a90a-490a-8028-69cc68204c45@github.com> References: <9sydfNb2vqZVOCh4mgVVElzsgQNXh8Hye8qr--gOyqs=.039d40f3-a90a-490a-8028-69cc68204c45@github.com> Message-ID: <4bamZb6h560Men7bJZj5x3Qx4bUr89xLSHZrRc6Pc24=.0f46c2bb-6ea9-42e7-b338-a3f5198f6432@github.com> On Thu, 21 Aug 2025 00:05:58 GMT, Dean Long wrote: > -Xcomp is a useful stress flag, and this test is meant to stress MHs, not the code cache, so can we increase the code cache size enough to let it pass with -Xcomp? The number of iterations are proportional to number of CPUs. so we cannot predict the required cache size. Another alternative is to cap the number of CPU based scaling to some number. If the number is capped, someone might doubt if we have stressed MHs enough. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26840#issuecomment-3208606364 From dzhang at openjdk.org Thu Aug 21 01:20:03 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 21 Aug 2025 01:20:03 GMT Subject: RFR: 8365841: RISC-V: Several IR verification tests fail after JDK-8350960 without Zvfh In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 07:01:59 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > The error in both cases is caused by the same reason: the target IR, MulReductionVI, is not matched. > This is because the match_rule_supported_vector in riscv_v.ad is missing a break. If the if condition in `case MulReductionVI` evaluates to false, the loop will not exit until the `return UseZvfh`. > > Failed IR tests: > compiler/loopopts/superword/ProdRed_Int.java > compiler/loopopts/superword/RedTest_int.java > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/ProdRed_Int.java on k1 and k230 > - [x] Run compiler/loopopts/superword/RedTest_int.java on k1 and k230 Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26854#issuecomment-3208605126 From duke at openjdk.org Thu Aug 21 01:20:03 2025 From: duke at openjdk.org (duke) Date: Thu, 21 Aug 2025 01:20:03 GMT Subject: RFR: 8365841: RISC-V: Several IR verification tests fail after JDK-8350960 without Zvfh In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 07:01:59 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > The error in both cases is caused by the same reason: the target IR, MulReductionVI, is not matched. > This is because the match_rule_supported_vector in riscv_v.ad is missing a break. If the if condition in `case MulReductionVI` evaluates to false, the loop will not exit until the `return UseZvfh`. > > Failed IR tests: > compiler/loopopts/superword/ProdRed_Int.java > compiler/loopopts/superword/RedTest_int.java > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/ProdRed_Int.java on k1 and k230 > - [x] Run compiler/loopopts/superword/RedTest_int.java on k1 and k230 @DingliZhang Your change (at version 33dd53a9d94676a638fb1c02fe9107210c829267) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26854#issuecomment-3208607200 From dzhang at openjdk.org Thu Aug 21 01:22:57 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 21 Aug 2025 01:22:57 GMT Subject: Integrated: 8365841: RISC-V: Several IR verification tests fail after JDK-8350960 without Zvfh In-Reply-To: References: Message-ID: <54zN2PtIcYjjIIGh95grxc4gBPwpqMk873NNmZVeRvs=.e9de57c8-4a10-4041-bc85-c4329d442dc5@github.com> On Wed, 20 Aug 2025 07:01:59 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > The error in both cases is caused by the same reason: the target IR, MulReductionVI, is not matched. > This is because the match_rule_supported_vector in riscv_v.ad is missing a break. If the if condition in `case MulReductionVI` evaluates to false, the loop will not exit until the `return UseZvfh`. > > Failed IR tests: > compiler/loopopts/superword/ProdRed_Int.java > compiler/loopopts/superword/RedTest_int.java > > ### Test (fastdebug) > - [x] Run compiler/loopopts/superword/ProdRed_Int.java on k1 and k230 > - [x] Run compiler/loopopts/superword/RedTest_int.java on k1 and k230 This pull request has now been integrated. Changeset: 2e06a917 Author: Dingli Zhang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/2e06a917659d76fa1b4c63f38894564679209625 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8365841: RISC-V: Several IR verification tests fail after JDK-8350960 without Zvfh Reviewed-by: fyang, fjiang, mli ------------- PR: https://git.openjdk.org/jdk/pull/26854 From xgong at openjdk.org Thu Aug 21 01:35:56 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 21 Aug 2025 01:35:56 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v5] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 02:31:08 GMT, Xiaohong Gong wrote: >> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform. >> >> ### Background >> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register. >> >> ### Implementation >> >> #### Challenges >> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints. >> >> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches: >> - SPECIES_64: Single operation with mask (8 elements, 256-bit) >> - SPECIES_128: Single operation, full register (16 elements, 512-bit) >> - SPECIES_256: Two operations + merge (32 elements, 1024-bit) >> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit) >> >> Use `ByteVector.SPECIES_512` as an example: >> - It contains 64 elements. So the index vector size should be `64 * 32` bits, which is 4 times of the SVE vector register size. >> - It requires 4 times of vector gather-loads to finish the whole operation. >> >> >> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...] >> int[] idx = [0, 1, 2, 3, ..., 63, ...] >> >> 4 gather-load: >> idx_v1 = [15 14 13 ... 1 0] gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa] >> idx_v2 = [31 30 29 ... 17 16] gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb] >> idx_v3 = [47 46 45 ... 33 32] gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc] >> idx_v4 = [63 62 61 ... 49 48] gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd] >> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa] >> >> >> #### Solution >> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end. >> >> Here is the main changes: >> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher. >> - Added `VectorSliceNode` for result mer... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge 'jdk:master' into JDK-8351623-sve > - Address review comments > - Refine IR pattern and clean backend rules > - Fix indentation issue and move the helper matcher method to header files > - Merge branch jdk:master into JDK-8351623-sve > - 8351623: VectorAPI: Add SVE implementation of subword gather load operation ping~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3208634215 From kvn at openjdk.org Thu Aug 21 01:37:02 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 21 Aug 2025 01:37:02 GMT Subject: RFR: 8364501: Compiler shutdown crashes on access to deleted CompileTask [v2] In-Reply-To: References: <5MMF3mjz3V6DbYhKMyzJx2G8CcNsLGkJ9TkpXsDAICQ=.3badd2a0-1bed-441d-8d45-a05b4a411678@github.com> Message-ID: On Mon, 11 Aug 2025 16:22:56 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. >> >> In short, with recent changes to `delete` `CompileTask`-s, we end up in the rare situation where we can access tasks that have been already deleted. The major and obivous mistake I committed myself with [JDK-8361752](https://bugs.openjdk.org/browse/JDK-8361752) in `CompileQueue::delete_all`: the code first `delete`-s, then asks for `next` (facepalms). >> >> Another case is less trivial, and mostly fix in abundance of caution: in `wait_for_completion`, we can exit while blocking task is still in queue. Current code skip deletions only when compiler is shutdown for compilation, but I think the condition should be stronger: unless the task is completed, we should assume it might carry the queue-ing `next`/`prev` pointers that `delete_all` would need, and skip deletion. Realistically, it would "leak" only on compiler shutdown, like before. >> >> I have also put in some diagnostic code to catch the lifecycle issues like this more reliably, and cleaned up `next`, `prev` lifecycle to clearly disconnect the `CompileTasks` that are no longer in queue. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, reproducer no longer fails >> - [x] Linux AArch64 server fastdebug, `compiler` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Chicken out of memset-ing the possibly vtable-bearing object There is issue with "stale" task. I filed https://bugs.openjdk.org/browse/JDK-8365891 ------------- PR Comment: https://git.openjdk.org/jdk/pull/26696#issuecomment-3208636096 From kvn at openjdk.org Thu Aug 21 01:55:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 21 Aug 2025 01:55:54 GMT Subject: RFR: 8365891: failed: Completed task should not be in the queue Message-ID: Added missing `task->set_next(nullptr)` for "stale" tasks. Testing: tier1-3,xcomp,stress ------------- Commit messages: - 8365891: failed: Completed task should not be in the queue Changes: https://git.openjdk.org/jdk/pull/26872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26872&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365891 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26872/head:pull/26872 PR: https://git.openjdk.org/jdk/pull/26872 From dzhang at openjdk.org Thu Aug 21 01:58:55 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 21 Aug 2025 01:58:55 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV In-Reply-To: <9zCz8rLDzNQwtZhSzcirzzUwAN6sOmGrzPaMx6ZAlXc=.70335351-7665-4e52-9430-f81b7bd07255@github.com> References: <9zCz8rLDzNQwtZhSzcirzzUwAN6sOmGrzPaMx6ZAlXc=.70335351-7665-4e52-9430-f81b7bd07255@github.com> Message-ID: On Wed, 20 Aug 2025 13:03:05 GMT, Emanuel Peter wrote: >> Hi, >> Can you help to review this patch? Thanks! >> >> We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. >> The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. >> >> ### Test (fastdebug) >> - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java line 43: > >> 41: * @test >> 42: * @requires vm.debug == true & vm.compiler2.enabled & vm.flagless >> 43: * @requires (os.arch != "riscv64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*rvv.*")) > > Generally, it would be prefereable to adjust the IR rules. But I'm not sure if that is preferrable here. So I think that this is the right solution. > > @chhagedorn This test may fail on other platforms as well that don't have all the required optimizations, such as vectors and others. Should we accept this solution? @eme64 Thanks for the review! I have another method to change the IR matching rules for riscv64 so that other tests can be run without RVV: diff --git a/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java b/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java index ac8867f3985..2bf14bdfa5a 100644 --- a/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java +++ b/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java @@ -1124,9 +1124,21 @@ public void wrongCountString() {} @Test @FailCount(8) - @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}) - @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}) // valid - @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0"}) // valid + @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}, + applyIfPlatform = {"riscv64", "false"}) + @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}, + applyIfPlatform = {"riscv64", "true"}, + applyIfCPUFeature = {"rvv", "true"}) + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}, + applyIfPlatform = {"riscv64", "false"}) // valid + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}, + applyIfPlatform = {"riscv64", "true"}, + applyIfCPUFeature = {"rvv", "true"}) + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0"}, + applyIfPlatform = {"riscv64", "false"}) // valid + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0"}, + applyIfPlatform = {"riscv64", "true"}, + applyIfCPUFeature = {"rvv", "true"}) @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE + "", "> 0"}) @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE + "xxx", "> 0"}) @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE + "min()", "> 0"}) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26855#discussion_r2288398891 From dzhang at openjdk.org Thu Aug 21 02:04:52 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 21 Aug 2025 02:04:52 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV In-Reply-To: <9zCz8rLDzNQwtZhSzcirzzUwAN6sOmGrzPaMx6ZAlXc=.70335351-7665-4e52-9430-f81b7bd07255@github.com> References: <9zCz8rLDzNQwtZhSzcirzzUwAN6sOmGrzPaMx6ZAlXc=.70335351-7665-4e52-9430-f81b7bd07255@github.com> Message-ID: <1ZPPraxTuyMlKXpkuZOrpPgFXFGLu3-C9GHL09Mc4Wg=.a2d5df13-9456-4060-ac81-3cf3fdce40d4@github.com> On Wed, 20 Aug 2025 13:03:05 GMT, Emanuel Peter wrote: >> Hi, >> Can you help to review this patch? Thanks! >> >> We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. >> The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. >> >> ### Test (fastdebug) >> - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java line 43: > >> 41: * @test >> 42: * @requires vm.debug == true & vm.compiler2.enabled & vm.flagless >> 43: * @requires (os.arch != "riscv64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*rvv.*")) > > Generally, it would be prefereable to adjust the IR rules. But I'm not sure if that is preferrable here. So I think that this is the right solution. > > @chhagedorn This test may fail on other platforms as well that don't have all the required optimizations, such as vectors and others. Should we accept this solution? > @eme64 Thanks for the review! I have another method to change the IR matching rules for riscv64 so that other tests can be run without RVV: > > ```diff > diff --git a/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java b/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java > index ac8867f3985..2bf14bdfa5a 100644 > --- a/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java > +++ b/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java > @@ -1124,9 +1124,21 @@ public void wrongCountString() {} > > @Test > @FailCount(8) > - @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}) > - @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}) // valid > - @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0"}) // valid > + @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}, > + applyIfPlatform = {"riscv64", "false"}) > + @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}, > + applyIfPlatform = {"riscv64", "true"}, > + applyIfCPUFeature = {"rvv", "true"}) > + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}, > + applyIfPlatform = {"riscv64", "false"}) // valid > + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}, > + applyIfPlatform = {"riscv64", "true"}, > + applyIfCPUFeature = {"rvv", "true"}) > + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0"}, > + applyIfPlatform = {"riscv64", "false"}) // valid > + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0"}, > + applyIfPlatform = {"riscv64", "true"}, > + applyIfCPUFeature = {"rvv", "true"}) > @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE + "", "> 0"}) > @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE + "xxx", "> 0"}) > @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE + "min()", "> 0"}) > ``` Hi @eme64 What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26855#discussion_r2289683141 From dholmes at openjdk.org Thu Aug 21 02:24:52 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 21 Aug 2025 02:24:52 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v2] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: <871DbNrXSrdwXxEynOq_fZjvQXMP30abzW17OxgIX4E=.30cb9ca0-914a-472e-aa38-34cb6c034e0e@github.com> On Wed, 20 Aug 2025 18:32:22 GMT, Igor Veresov wrote: >> This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Fix minimal build Just a few drive-by comments as I'm not familiar with this "training" stuff. src/hotspot/share/compiler/compilationPolicy.cpp line 141: > 139: } > 140: > 141: void CompilationPolicy::flush_replay_training_at_init(TRAPS) { This method seems to be waiting for something to finish, not "flushing" anything itself. src/hotspot/share/compiler/compilationPolicy.cpp line 142: > 140: > 141: void CompilationPolicy::flush_replay_training_at_init(TRAPS) { > 142: MonitorLocker locker(THREAD, TrainingReplayQueue_lock); There is no exception processing here so this method should not be declared to take `TRAPS`. If you want to pass the current thread just declare a `JavaThread* current` parameter directly please. src/hotspot/share/oops/trainingData.hpp line 678: > 676: void dec_init_deps_left(KlassTrainingData* ktd); > 677: int init_deps_left() const { > 678: return Atomic::load_acquire(&_init_deps_left); Where is the `release_store` (or other ordered atomic op) that this pairs with? Also there is a convention to make acquire/release semantics clear in the API method names i.e. in this case `init_deps_left_acquire()`. src/hotspot/share/runtime/java.cpp line 522: > 520: if (AOTVerifyTrainingData) { > 521: EXCEPTION_MARK; > 522: CompilationPolicy::flush_replay_training_at_init(THREAD); Looks odd to have an `at_init` method executing during VM shutdown. ------------- PR Review: https://git.openjdk.org/jdk/pull/26866#pullrequestreview-3138788035 PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289684061 PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289682761 PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289698544 PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289701195 From dholmes at openjdk.org Thu Aug 21 02:24:53 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 21 Aug 2025 02:24:53 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v2] In-Reply-To: <871DbNrXSrdwXxEynOq_fZjvQXMP30abzW17OxgIX4E=.30cb9ca0-914a-472e-aa38-34cb6c034e0e@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <871DbNrXSrdwXxEynOq_fZjvQXMP30abzW17OxgIX4E=.30cb9ca0-914a-472e-aa38-34cb6c034e0e@github.com> Message-ID: On Thu, 21 Aug 2025 02:01:26 GMT, David Holmes wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix minimal build > > src/hotspot/share/compiler/compilationPolicy.cpp line 142: > >> 140: >> 141: void CompilationPolicy::flush_replay_training_at_init(TRAPS) { >> 142: MonitorLocker locker(THREAD, TrainingReplayQueue_lock); > > There is no exception processing here so this method should not be declared to take `TRAPS`. If you want to pass the current thread just declare a `JavaThread* current` parameter directly please. Hmmm I see this code is full of incorrect TRAPS usage! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289686080 From iveresov at openjdk.org Thu Aug 21 02:33:55 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 21 Aug 2025 02:33:55 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v2] In-Reply-To: <871DbNrXSrdwXxEynOq_fZjvQXMP30abzW17OxgIX4E=.30cb9ca0-914a-472e-aa38-34cb6c034e0e@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <871DbNrXSrdwXxEynOq_fZjvQXMP30abzW17OxgIX4E=.30cb9ca0-914a-472e-aa38-34cb6c034e0e@github.com> Message-ID: On Thu, 21 Aug 2025 02:02:56 GMT, David Holmes wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix minimal build > > src/hotspot/share/compiler/compilationPolicy.cpp line 141: > >> 139: } >> 140: >> 141: void CompilationPolicy::flush_replay_training_at_init(TRAPS) { > > This method seems to be waiting for something to finish, not "flushing" anything itself. It has a semantic effect of flushing the queue... What would you like it to be renamed to? > src/hotspot/share/oops/trainingData.hpp line 678: > >> 676: void dec_init_deps_left(KlassTrainingData* ktd); >> 677: int init_deps_left() const { >> 678: return Atomic::load_acquire(&_init_deps_left); > > Where is the `release_store` (or other ordered atomic op) that this pairs with? > > Also there is a convention to make acquire/release semantics clear in the API method names i.e. in this case `init_deps_left_acquire()`. There is an `Atomic::sub()` in `dec_init_deps_left()`. > src/hotspot/share/runtime/java.cpp line 522: > >> 520: if (AOTVerifyTrainingData) { >> 521: EXCEPTION_MARK; >> 522: CompilationPolicy::flush_replay_training_at_init(THREAD); > > Looks odd to have an `at_init` method executing during VM shutdown. `init` here means `class initialization`. We can rename it if you want, but that's the current convention everybody expects in the leyden project. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289712736 PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289708246 PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289710061 From iveresov at openjdk.org Thu Aug 21 02:33:56 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 21 Aug 2025 02:33:56 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v2] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <871DbNrXSrdwXxEynOq_fZjvQXMP30abzW17OxgIX4E=.30cb9ca0-914a-472e-aa38-34cb6c034e0e@github.com> Message-ID: On Thu, 21 Aug 2025 02:05:03 GMT, David Holmes wrote: >> src/hotspot/share/compiler/compilationPolicy.cpp line 142: >> >>> 140: >>> 141: void CompilationPolicy::flush_replay_training_at_init(TRAPS) { >>> 142: MonitorLocker locker(THREAD, TrainingReplayQueue_lock); >> >> There is no exception processing here so this method should not be declared to take `TRAPS`. If you want to pass the current thread just declare a `JavaThread* current` parameter directly please. > > Hmmm I see this code is full of incorrect TRAPS usage! Yeah, good point, I'll clean this up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289712667 From iveresov at openjdk.org Thu Aug 21 02:45:08 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 21 Aug 2025 02:45:08 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v3] In-Reply-To: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: <35GmGxUo9cqdaEpS_CkmIy5upOwwZrA_vFysYoDIkAQ=.b523c75f-0920-4b21-aea9-4fbea14f073b@github.com> > This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Rephrase arguments of the flush() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26866/files - new: https://git.openjdk.org/jdk/pull/26866/files/b9dde139..3a4690fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26866&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26866&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26866.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26866/head:pull/26866 PR: https://git.openjdk.org/jdk/pull/26866 From iveresov at openjdk.org Thu Aug 21 03:00:11 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 21 Aug 2025 03:00:11 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v4] In-Reply-To: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: > This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: More cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26866/files - new: https://git.openjdk.org/jdk/pull/26866/files/3a4690fc..289fb74c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26866&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26866&range=02-03 Stats: 11 lines in 3 files changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/26866.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26866/head:pull/26866 PR: https://git.openjdk.org/jdk/pull/26866 From iveresov at openjdk.org Thu Aug 21 03:00:12 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 21 Aug 2025 03:00:12 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v2] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <871DbNrXSrdwXxEynOq_fZjvQXMP30abzW17OxgIX4E=.30cb9ca0-914a-472e-aa38-34cb6c034e0e@github.com> Message-ID: <4D_wqUeSNSk0d8oNpJqKiE6USu4CNEE6Sv5OvvqRSpI=.d2f863be-6a2f-46cc-a175-2902c7a7cb5b@github.com> On Thu, 21 Aug 2025 02:31:40 GMT, Igor Veresov wrote: >> src/hotspot/share/compiler/compilationPolicy.cpp line 141: >> >>> 139: } >>> 140: >>> 141: void CompilationPolicy::flush_replay_training_at_init(TRAPS) { >> >> This method seems to be waiting for something to finish, not "flushing" anything itself. > > It has a semantic effect of flushing the queue... What would you like it to be renamed to? I'll rename it to `wait`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289732989 From iveresov at openjdk.org Thu Aug 21 03:00:12 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 21 Aug 2025 03:00:12 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v2] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <871DbNrXSrdwXxEynOq_fZjvQXMP30abzW17OxgIX4E=.30cb9ca0-914a-472e-aa38-34cb6c034e0e@github.com> Message-ID: On Thu, 21 Aug 2025 02:31:36 GMT, Igor Veresov wrote: >> Hmmm I see this code is full of incorrect TRAPS usage! > > Yeah, good point, I'll clean this up. I cleaned it up around the init replay queue and its usage. There are more instances of this but it would probably need to be a separate PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289735389 From dholmes at openjdk.org Thu Aug 21 03:20:56 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 21 Aug 2025 03:20:56 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v2] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <871DbNrXSrdwXxEynOq_fZjvQXMP30abzW17OxgIX4E=.30cb9ca0-914a-472e-aa38-34cb6c034e0e@github.com> Message-ID: On Thu, 21 Aug 2025 02:55:37 GMT, Igor Veresov wrote: >> Yeah, good point, I'll clean this up. > > I cleaned it up around the init replay queue and its usage. There are more instances of this but it would probably need to be a separate PR. Yeah I noticed it is quite wide-spread and pinged the compiler team internally. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289753070 From dholmes at openjdk.org Thu Aug 21 03:20:57 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 21 Aug 2025 03:20:57 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v4] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: On Thu, 21 Aug 2025 03:00:11 GMT, Igor Veresov wrote: >> This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > More cleanup src/hotspot/share/compiler/compilationPolicy.cpp line 173: > 171: } > 172: > 173: void CompilationPolicy::replay_training_at_init(InstanceKlass* klass, JavaThread* THREAD) { Please rename `THREAD` to `current`. `THREAD` is still inherently part of the `TRAPS` mechanism. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289757585 From dholmes at openjdk.org Thu Aug 21 03:20:59 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 21 Aug 2025 03:20:59 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v2] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <871DbNrXSrdwXxEynOq_fZjvQXMP30abzW17OxgIX4E=.30cb9ca0-914a-472e-aa38-34cb6c034e0e@github.com> Message-ID: On Thu, 21 Aug 2025 02:27:40 GMT, Igor Veresov wrote: >> src/hotspot/share/oops/trainingData.hpp line 678: >> >>> 676: void dec_init_deps_left(KlassTrainingData* ktd); >>> 677: int init_deps_left() const { >>> 678: return Atomic::load_acquire(&_init_deps_left); >> >> Where is the `release_store` (or other ordered atomic op) that this pairs with? >> >> Also there is a convention to make acquire/release semantics clear in the API method names i.e. in this case `init_deps_left_acquire()`. > > There is an `Atomic::sub()` in `dec_init_deps_left()`. Okay - not obvious we actually require acquire semantics when reading a simple count, but I'm not sure what the count may imply. But please consider renaming the method. >> src/hotspot/share/runtime/java.cpp line 522: >> >>> 520: if (AOTVerifyTrainingData) { >>> 521: EXCEPTION_MARK; >>> 522: CompilationPolicy::flush_replay_training_at_init(THREAD); >> >> Looks odd to have an `at_init` method executing during VM shutdown. > > `init` here means `class initialization`. We can rename it if you want, but that's the current convention everybody expects in the leyden project. "from_init" might be better if this represents training data from class initialization time. Though is there non-init training data? i.e. could it just be `flush_replay_training`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289754815 PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289756627 From dlong at openjdk.org Thu Aug 21 03:25:50 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 21 Aug 2025 03:25:50 GMT Subject: RFR: 8365891: failed: Completed task should not be in the queue In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 01:48:55 GMT, Vladimir Kozlov wrote: > Added missing `task->set_next(nullptr)` for "stale" tasks. > > Testing: tier1-3,xcomp,stress Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26872#pullrequestreview-3138907844 From amitkumar at openjdk.org Thu Aug 21 03:56:00 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 21 Aug 2025 03:56:00 GMT Subject: RFR: 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size [v3] In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 09:42:08 GMT, Amit Kumar wrote: >> There isn't enough initial cache present which can let the interpreter mode run freely. So before even we reach to the compiler phase and try to bail out, in case there isn't enough space left for the stub compilation, JVM crashes. Idea is to increase the Initial cache size and make it enough to run interpreter mode at least. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > adds comment for larger size requirement GHA failures are due to infrastructure issue. Thanks for the comments and approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25741#issuecomment-3208903073 From amitkumar at openjdk.org Thu Aug 21 03:56:01 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 21 Aug 2025 03:56:01 GMT Subject: Integrated: 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 04:33:30 GMT, Amit Kumar wrote: > There isn't enough initial cache present which can let the interpreter mode run freely. So before even we reach to the compiler phase and try to bail out, in case there isn't enough space left for the stub compilation, JVM crashes. Idea is to increase the Initial cache size and make it enough to run interpreter mode at least. This pull request has now been integrated. Changeset: 78d50c02 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/78d50c02152d3d02953cc468d50c7c40c43c1527 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size Reviewed-by: lucy, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/25741 From iveresov at openjdk.org Thu Aug 21 04:21:52 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 21 Aug 2025 04:21:52 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v4] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: <2mpeNLuJywlDXNQZ13_2ffFsW_4pwOXr-vWqnwQW4jM=.0d50c86f-02dc-4431-bf4f-999a0d7da232@github.com> On Thu, 21 Aug 2025 03:18:38 GMT, David Holmes wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> More cleanup > > src/hotspot/share/compiler/compilationPolicy.cpp line 173: > >> 171: } >> 172: >> 173: void CompilationPolicy::replay_training_at_init(InstanceKlass* klass, JavaThread* THREAD) { > > Please rename `THREAD` to `current`. `THREAD` is still inherently part of the `TRAPS` mechanism. ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2289813352 From amitkumar at openjdk.org Thu Aug 21 05:05:31 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 21 Aug 2025 05:05:31 GMT Subject: RFR: 8361536: [s390x] Saving return_pc at wrong offset [v2] In-Reply-To: References: Message-ID: <7GOHAZFlyiKCrfQPQOp4dugrGW7dEDer4gqR8EhBEwQ=.2962ae08-bab7-4f8d-81b3-b25f2ba668ca@github.com> > Fixes the bug where return pc was stored at a wrong offset, which causes issue with java abi. > > Issue appeared in #26004, see the comment: https://github.com/openjdk/jdk/pull/26004#issuecomment-3017928879. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: re-adjust offset, 80 is free so we can start saving from there ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26209/files - new: https://git.openjdk.org/jdk/pull/26209/files/f01c0e4f..2a377c17 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26209&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26209&range=00-01 Stats: 16 lines in 1 file changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/26209.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26209/head:pull/26209 PR: https://git.openjdk.org/jdk/pull/26209 From amitkumar at openjdk.org Thu Aug 21 05:10:52 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 21 Aug 2025 05:10:52 GMT Subject: RFR: 8361536: [s390x] Saving return_pc at wrong offset [v2] In-Reply-To: References: Message-ID: <0J4lCh4caIFsyzZjfNGbTS1ZVbszQUxbz8Hgb-EsjVk=.6325b588-31b5-4f4e-86f5-299578fd5a1e@github.com> On Fri, 1 Aug 2025 21:07:29 GMT, Martin Doerr wrote: > I did not request reverting it. I only corrected the wrong description. You can use your favorite offsets :-) Hi Martin, I reverted, because release build was crashing (I added crash log above). Mathematically, offset 80 should be fine to use. So I had to debug that. I found out that mistakenly `Z_F14` was saved at offset `127` instead of `128`, which corrupted `Z_F13` register. That is fixed now. I am running test again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26209#issuecomment-3209017058 From chagedorn at openjdk.org Thu Aug 21 05:51:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 21 Aug 2025 05:51:55 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v12] In-Reply-To: References: Message-ID: On Tue, 19 Aug 2025 17:31:58 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. >> >> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. >> >> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. >> >> Testing: >> - [x] Github Actions >> - [x] tier1, tier2 on all platforms >> - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug >> - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Print with %zd Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26023#pullrequestreview-3139108863 From chagedorn at openjdk.org Thu Aug 21 05:58:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 21 Aug 2025 05:58:52 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV In-Reply-To: <1ZPPraxTuyMlKXpkuZOrpPgFXFGLu3-C9GHL09Mc4Wg=.a2d5df13-9456-4060-ac81-3cf3fdce40d4@github.com> References: <9zCz8rLDzNQwtZhSzcirzzUwAN6sOmGrzPaMx6ZAlXc=.70335351-7665-4e52-9430-f81b7bd07255@github.com> <1ZPPraxTuyMlKXpkuZOrpPgFXFGLu3-C9GHL09Mc4Wg=.a2d5df13-9456-4060-ac81-3cf3fdce40d4@github.com> Message-ID: <8BCeQugR4I4OwcD7Yt6SwTymrigqXSfax4LyPjjeXuA=.330bcb99-6fdb-434e-8d7d-2e94dbbd3d3c@github.com> On Thu, 21 Aug 2025 02:01:55 GMT, Dingli Zhang wrote: >> test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java line 43: >> >>> 41: * @test >>> 42: * @requires vm.debug == true & vm.compiler2.enabled & vm.flagless >>> 43: * @requires (os.arch != "riscv64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*rvv.*")) >> >> Generally, it would be prefereable to adjust the IR rules. But I'm not sure if that is preferrable here. So I think that this is the right solution. >> >> @chhagedorn This test may fail on other platforms as well that don't have all the required optimizations, such as vectors and others. Should we accept this solution? > >> @eme64 Thanks for the review! I have another method to change the IR matching rules for riscv64 so that other tests can be run without RVV: >> >> ```diff >> diff --git a/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java b/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java >> index ac8867f3985..2bf14bdfa5a 100644 >> --- a/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java >> +++ b/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java >> @@ -1124,9 +1124,21 @@ public void wrongCountString() {} >> >> @Test >> @FailCount(8) >> - @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}) >> - @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}) // valid >> - @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0"}) // valid >> + @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}, >> + applyIfPlatform = {"riscv64", "false"}) >> + @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}, >> + applyIfPlatform = {"riscv64", "true"}, >> + applyIfCPUFeature = {"rvv", "true"}) >> + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}, >> + applyIfPlatform = {"riscv64", "false"}) // valid >> + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}, >> + applyIfPlatform = {"riscv64", "true"}, >> + applyIfCPUFeature = {"rvv", "true"}) >> + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0"}, >> + applyIfPlatform = {"riscv64", "false"}) // valid >> + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0"}, >> + applyIfPlatform = {"riscv64", "true"}, >> + applyIfCPUFeature = {"rvv", "true"}) >> @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE + "", "> 0"}) >> @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE + "xxx", "> 0"}) >> @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE + "min()", "> 0"}) >> ``` > > Hi @eme64 What do you think? >From the log you provided, it looks like only these two rules for `badVectorNodeSize()` @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}) @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}) ``` result in a format violation which was not unexpected on other platforms (they expect rule 1-3 to pass): - VMInfo: MaxVectorSize is not larger than zero for IR rule 1 at public int[] ir_framework.tests.BadIRAnnotationsAfterTestVM.badVectorNodeSize(). - VMInfo: MaxVectorSize is not larger than zero for IR rule 2 at public int[] ir_framework.tests.BadIRAnnotationsAfterTestVM.badVectorNodeSize(). It looks like `MaxVectorSize` is 0 when run without RVV on riscv. Can you try and just add applyIf = {"MaxVectorSize", ">0"} to both of these rules instead of the platform/CPU specific conditions? That would be less restrictive and also applies for other platforms that have `MaxVectorSize` set to 0 by default. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26855#discussion_r2289928891 From epeter at openjdk.org Thu Aug 21 06:01:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Aug 2025 06:01:55 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: <3m4I-aY-PTZsQa_SjoRayIbE2FC15xafQi3C8D9XqZs=.60c17714-5ec4-4a3e-96d6-687d81f3b275@github.com> Message-ID: On Wed, 20 Aug 2025 14:11:20 GMT, Bhavana Kilambi wrote: >> This type of pattern/code shape where one of the inputs is a constant is already being tested in https://github.com/openjdk/jdk/blob/e912977a6687917ed45520c4d8558ebe630e3f52/test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java#L335 >> >> I have created this one specifically for aarch64 to ensure both the backend mach nodes are correctly being generated. > > I can test this on x86 but do you think this test is required to be placed out of `aarch64` folder and make it available for all architectures when the same pattern is already being tested in the above testcase for all architectures? If it is just about checking backend mach nodes with an IR rule, then why not just add the IR rule to the existing test? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2289935826 From epeter at openjdk.org Thu Aug 21 06:05:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Aug 2025 06:05:58 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: <3m4I-aY-PTZsQa_SjoRayIbE2FC15xafQi3C8D9XqZs=.60c17714-5ec4-4a3e-96d6-687d81f3b275@github.com> Message-ID: On Thu, 21 Aug 2025 05:59:41 GMT, Emanuel Peter wrote: >> I can test this on x86 but do you think this test is required to be placed out of `aarch64` folder and make it available for all architectures when the same pattern is already being tested in the above testcase for all architectures? > > If it is just about checking backend mach nodes with an IR rule, then why not just add the IR rule to the existing test? After all it was with that test that we hit the assert, right? $ CONF=linux-aarch64-server-fastdebug make images test TEST=compiler/vectorization/TestFloat16VectorOperations.java # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:3756), pid=6237, tid=6259 # guarantee(false) failed: invalid immediate My opinion: - If it is exactly the same test -> keep it in the existing one. - If your tests have a different shape -> make it available to all other platforms, to check at least for correctness ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2289939268 From epeter at openjdk.org Thu Aug 21 06:06:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Aug 2025 06:06:00 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: <04_IpSYiBu9iLViEV2V5opYFqN7OzNewgUEOLSs_Cwc=.a8c693cd-900d-4602-9b88-76dd55f9a844@github.com> Message-ID: On Wed, 20 Aug 2025 14:08:06 GMT, Bhavana Kilambi wrote: >> Strange, because the IR/Test framework always triggers C2 compilation... How exactly did it fail to compile with C2? > > It fails to match the IR nodes. I think it happened when I used a smaller `Warmup`. With the `Warmup` I am using, it seems to be working fine. I will add that case as well. Ok, so then it is probably a profile issue. Thanks for adding both runs! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2289940713 From epeter at openjdk.org Thu Aug 21 06:08:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Aug 2025 06:08:05 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v18] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Wed, 20 Aug 2025 12:31:11 GMT, Emanuel Peter wrote: >> TODO work that arose during review process / recent merges with master: >> >> - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. >> - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. >> >> --------------- >> >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > disable flag if not possible > > I've also investigated the performance issue with the aliasing case that uses multiversioning. And I so far could not figure out the 10% performance regression, see detailed analysis attempt [#24278 (comment)](https://github.com/openjdk/jdk/pull/24278#issuecomment-3201092650) > > Is it possible it always go into slow path? Yes, the aliasing case would always take the slow path. But that should be as fast as the scalar performance before the patch, and the same performance as `not_profitable` where we do not vectorize. The strange thing is now that we enter the slow path, but somehow the performance is 10% lower than before. But as I showed, the scalar code is basically the same in the main loop that we execute. Something must be causing the 10% difference... ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3209120343 From dzhang at openjdk.org Thu Aug 21 06:13:52 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 21 Aug 2025 06:13:52 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV In-Reply-To: <8BCeQugR4I4OwcD7Yt6SwTymrigqXSfax4LyPjjeXuA=.330bcb99-6fdb-434e-8d7d-2e94dbbd3d3c@github.com> References: <9zCz8rLDzNQwtZhSzcirzzUwAN6sOmGrzPaMx6ZAlXc=.70335351-7665-4e52-9430-f81b7bd07255@github.com> <1ZPPraxTuyMlKXpkuZOrpPgFXFGLu3-C9GHL09Mc4Wg=.a2d5df13-9456-4060-ac81-3cf3fdce40d4@github.com> <8BCeQugR4I4OwcD7Yt6SwTymrigqXSfax4LyPjjeXuA=.330bcb99-6fdb-434e-8d7d-2e94dbbd3d3c@github.com> Message-ID: On Thu, 21 Aug 2025 05:54:59 GMT, Christian Hagedorn wrote: >>> @eme64 Thanks for the review! I have another method to change the IR matching rules for riscv64 so that other tests can be run without RVV: >>> >>> ```diff >>> diff --git a/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java b/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java >>> index ac8867f3985..2bf14bdfa5a 100644 >>> --- a/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java >>> +++ b/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java >>> @@ -1124,9 +1124,21 @@ public void wrongCountString() {} >>> >>> @Test >>> @FailCount(8) >>> - @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}) >>> - @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}) // valid >>> - @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0"}) // valid >>> + @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}, >>> + applyIfPlatform = {"riscv64", "false"}) >>> + @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}, >>> + applyIfPlatform = {"riscv64", "true"}, >>> + applyIfCPUFeature = {"rvv", "true"}) >>> + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}, >>> + applyIfPlatform = {"riscv64", "false"}) // valid >>> + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}, >>> + applyIfPlatform = {"riscv64", "true"}, >>> + applyIfCPUFeature = {"rvv", "true"}) >>> + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0"}, >>> + applyIfPlatform = {"riscv64", "false"}) // valid >>> + @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_ANY, "> 0"}, >>> + applyIfPlatform = {"riscv64", "true"}, >>> + applyIfCPUFeature = {"rvv", "true"}) >>> @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE + "", "> 0"}) >>> @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE + "xxx", "> 0"}) >>> @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE + "min()", "> 0"}) >>> ``` >> >> Hi @eme64 What do you think? > > From the log you provided, it looks like only these two rules for `badVectorNodeSize()` > > @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}) > @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}) > ``` > result in a format violation which was not unexpected on other platforms (they expect rule 1-3 to pass): > > - VMInfo: MaxVectorSize is not larger than zero for IR rule 1 at public int[] ir_framework.tests.BadIRAnnotationsAfterTestVM.badVectorNodeSize(). > - VMInfo: MaxVectorSize is not larger than zero for IR rule 2 at public int[] ir_framework.tests.BadIRAnnotationsAfterTestVM.badVectorNodeSize(). > > It looks like `MaxVectorSize` is 0 when run without RVV on riscv. Can you try and just add > > applyIf = {"MaxVectorSize", ">0"} > > to both of these rules instead of the platform/CPU specific conditions? That would be less restrictive and also applies for other platforms that have `MaxVectorSize` set to 0 by default. @chhagedorn Thanks for your review! This is indeed a better solution. I will try it and update later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26855#discussion_r2289953622 From duke at openjdk.org Thu Aug 21 06:17:58 2025 From: duke at openjdk.org (erifan) Date: Thu, 21 Aug 2025 06:17:58 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v11] In-Reply-To: References: Message-ID: On Wed, 9 Jul 2025 06:08:33 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Update the code comment Hi, can anyone review this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3209143930 From epeter at openjdk.org Thu Aug 21 06:18:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Aug 2025 06:18:56 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v4] In-Reply-To: <7QzCmv17rsIwVX0a4C_wTq4jhx6cob4juy454yuOof0=.fa045ee0-eaa4-43e1-853b-93880a0d44b3@github.com> References: <7QzCmv17rsIwVX0a4C_wTq4jhx6cob4juy454yuOof0=.fa045ee0-eaa4-43e1-853b-93880a0d44b3@github.com> Message-ID: On Wed, 20 Aug 2025 17:03:56 GMT, Manuel H?ssig wrote: >> This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. >> >> Testing: >> - [x] Github Actions >> - [x] tier1,tier2 plus some internal testing on Oracle supported platforms > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Make the test work Changes requested by epeter (Reviewer). test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 53: > 51: if (!e.getMessage().contains("The following scenarios have failed: #0, #1, #2")) { > 52: throw e; > 53: } Here we are still not addressing the "end of string" problem test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 65: > 63: } catch (TestRunException e) { > 64: if (!e.getMessage().contains("The following scenarios have failed: #0, #1, #2")|| > 65: e.getMessage().contains("The following scenarios have failed: #0, #1, #2, #3")) { Ok, but this does not really do the trick either, right? I'm not worried about `#3` particularly, but anything else after `#2`. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 97: > 95: } > 96: } > 97: } Could we also have a case with empty strings? Maybe it is already possible. That would allow us to keep things "default". We don't always know what that is. Imagine we may want a runt with the default value for `-XX:TLABRefillWastFraction=...` but that value could be different on different platforms. Does that make sense? ------------- PR Review: https://git.openjdk.org/jdk/pull/26762#pullrequestreview-3139158225 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2289955973 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2289955505 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2289960552 From epeter at openjdk.org Thu Aug 21 06:21:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Aug 2025 06:21:55 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 22:16:36 GMT, Jasmine Karthikeyan wrote: >> src/hotspot/share/opto/superword.cpp line 2576: >> >>> 2574: >>> 2575: // Vector nodes and casts should not truncate. >>> 2576: if (type->isa_vect() != nullptr || type->isa_vectmask() != nullptr || in->is_Reduction() || in->is_ConstraintCast()) { >> >> Why should we not truncate a CastII? What can go wrong? > > My thinking was that since casts specifically change the type of a node, they may not be safe to truncate. In practice it might not matter because after the CastII pack is created, it's discarded because there is no backend implementation for vectorized CastII. I've opted to mark them as non-truncating to stay on the safer side. I see. Ok. Can you add a comment to the code for that? Because imagine we come along later and actually implement a backend vectorized version of CastII (no-op?). Maybe because we implement if-conversion. Then it would be nice to know if this was just a "to be on the safe side" check, or if it would run into issues when removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2289964681 From epeter at openjdk.org Thu Aug 21 06:21:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Aug 2025 06:21:55 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 06:18:12 GMT, Emanuel Peter wrote: >> My thinking was that since casts specifically change the type of a node, they may not be safe to truncate. In practice it might not matter because after the CastII pack is created, it's discarded because there is no backend implementation for vectorized CastII. I've opted to mark them as non-truncating to stay on the safer side. > > I see. Ok. Can you add a comment to the code for that? > Because imagine we come along later and actually implement a backend vectorized version of CastII (no-op?). Maybe because we implement if-conversion. Then it would be nice to know if this was just a "to be on the safe side" check, or if it would run into issues when removed. Because the current comment says "should not truncate". That sounds more strong than "to be on the safe side". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2289966071 From amitkumar at openjdk.org Thu Aug 21 06:28:57 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 21 Aug 2025 06:28:57 GMT Subject: RFR: 8361536: [s390x] Saving return_pc at wrong offset [v2] In-Reply-To: <0J4lCh4caIFsyzZjfNGbTS1ZVbszQUxbz8Hgb-EsjVk=.6325b588-31b5-4f4e-86f5-299578fd5a1e@github.com> References: <0J4lCh4caIFsyzZjfNGbTS1ZVbszQUxbz8Hgb-EsjVk=.6325b588-31b5-4f4e-86f5-299578fd5a1e@github.com> Message-ID: On Thu, 21 Aug 2025 05:08:23 GMT, Amit Kumar wrote: > I am running test again. I don't see new failure appearing in tier1 test with neither fastdebug nor release build. GHA failure are due to infra issue. Will it be possible to get another approval as bots are not letting me integrate this :( ------------- PR Comment: https://git.openjdk.org/jdk/pull/26209#issuecomment-3209164148 From dzhang at openjdk.org Thu Aug 21 06:52:07 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 21 Aug 2025 06:52:07 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? Thanks! > > We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. > The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. > > ### Test (fastdebug) > - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Enable test without RVV and fix in BadIRAnnotationsAfterTestVM ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26855/files - new: https://git.openjdk.org/jdk/pull/26855/files/1b9f1698..d4468d61 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26855&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26855&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26855.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26855/head:pull/26855 PR: https://git.openjdk.org/jdk/pull/26855 From dzhang at openjdk.org Thu Aug 21 06:52:08 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 21 Aug 2025 06:52:08 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 07:56:19 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. > The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. > > ### Test (fastdebug) > - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 Update: TestBadFormat.java passed on k1/k230/sg2042 with `applyIf = {"MaxVectorSize", ">0"}` ------------- PR Comment: https://git.openjdk.org/jdk/pull/26855#issuecomment-3209224547 From duke at openjdk.org Thu Aug 21 07:00:35 2025 From: duke at openjdk.org (erifan) Date: Thu, 21 Aug 2025 07:00:35 GMT Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI expand operation [v2] In-Reply-To: References: Message-ID: <_YDJIkwt0sdsOAMfNNn1fHTVwH0SHDpJv5NpQoxnfiA=.a0ddb5f3-00f1-47e2-93da-f47cb3f62288@github.com> > Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified: > 1. **Subword types** on SVE2-capable hardware. > 2. **All types** on NEON and SVE1 environments. > > As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments. > > Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example: > > To compute: dst = src.expand(mask) > Data direction: high <== low > Input: > src = p o n m l k j i h g f e d c b a > mask = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 > Expected result: > dst = 0 0 h g 0 0 f e 0 0 d c 0 0 b a > > Step 1: calculate the index input of the TBL instruction. > > // Set tmp1 as all 0 vector. > tmp1 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > // Move the mask bits from the predicate register to a vector register. > // **1-bit** mask lane of P register to **8-bit** mask lane of V register. > tmp2 = mask = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 > > // Shift the entire register. Prefix sum algorithm. > dst = tmp2 << 8 = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 > tmp2 += dst = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1 > > dst = tmp2 << 16 = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0 > tmp2 += dst = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 > > dst = tmp2 << 32 = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0 > tmp2 += dst = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1 > > dst = tmp2 << 64 = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0 > tmp2 += dst = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1 > > // Clear inactive elements. > dst = sel(mask, tmp2, tmp1) = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1 > > // Set the inactive lane value to -1 and set the active lane to the target index. > dst -= 1 = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0 > > Step 2: shuffle the source vector elements to the target vector > > tbl(dst, src, dst) = 0 0 h g 0 0 f e 0 0 d c 0 0 b a > > > The same algorithm is used for NEON and SVE1, but with different instructions where appropriate. > > The following benchmarks are from panama-... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Improve the comment of the vector expand implementation - Merge branch 'master' into JDK-8363989 - 8363989: AArch64: Add missing backend support of VectorAPI expand operation Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified: 1. **Subword types** on SVE2-capable hardware. 2. **All types** on NEON and SVE1 environments. As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments. Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example: ``` To compute: dst = src.expand(mask) Data direction: high <== low Input: src = p o n m l k j i h g f e d c b a mask = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 Expected result: dst = 0 0 h g 0 0 f e 0 0 d c 0 0 b a ``` Step 1: calculate the index input of the TBL instruction. ``` // Set tmp1 as all 0 vector. tmp1 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 // Move the mask bits from the predicate register to a vector register. // **1-bit** mask lane of P register to **8-bit** mask lane of V register. tmp2 = mask = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 // Shift the entire register. Prefix sum algorithm. dst = tmp2 << 8 = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 tmp2 += dst = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1 dst = tmp2 << 16 = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0 tmp2 += dst = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 dst = tmp2 << 32 = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0 tmp2 += dst = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1 dst = tmp2 << 64 = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0 tmp2 += dst = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1 // Clear inactive elements. dst = sel(mask, tmp2, tmp1) = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1 // Set the inactive lane value to -1 and set the active lane to the target index. dst -= 1 = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0 ``` Step 2: shuffle the source vector elements to the target vector ``` tbl(dst, src, dst) = 0 0 h g 0 0 f e 0 0 d c 0 0 b a ``` The same algorithm is used for NEON and SVE1, but with different instructions where appropriate. The following benchmarks are from panama-vector/vectorIntrinsics. On Nvidia Grace machine with option `-XX:UseSVE=2`: ``` Benchmark Unit Before Score Error After Score Error Uplift Byte128Vector.expand ops/ms 1791.022366 5.619883 9633.388683 1.968788 5.37 Double128Vector.expand ops/ms 4489.255846 0.48485 4488.772949 0.491596 0.99 Float128Vector.expand ops/ms 8863.02424 6.888087 8908.352235 51.487453 1 Int128Vector.expand ops/ms 8873.485683 3.275682 8879.635643 1.243863 1 Long128Vector.expand ops/ms 4485.1149 4.458073 4489.365269 0.851093 1 Short128Vector.expand ops/ms 792.068834 2.640398 5880.811288 6.40683 7.42 Byte64Vector.expand ops/ms 854.455002 8.548982 5999.046295 37.209987 7.02 Double64Vector.expand ops/ms 46.49763 0.104773 46.526043 0.102451 1 Float64Vector.expand ops/ms 4510.596811 0.504477 4509.984244 1.519178 0.99 Int64Vector.expand ops/ms 4508.778322 1.664461 4535.216611 26.742484 1 Long64Vector.expand ops/ms 45.665462 0.705485 46.496232 0.075648 1.01 Short64Vector.expand ops/ms 394.527324 1.284691 3860.199621 0.720015 9.78 ``` On Nvidia Grace machine with option `-XX:UseSVE=1`: ``` Benchmark Unit Before Score Error After Score Error Uplift Byte128Vector.expand ops/ms 1767.314171 12.431526 9630.892248 1.478813 5.44 Double128Vector.expand ops/ms 197.614381 0.945541 2416.075281 2.664325 12.22 Float128Vector.expand ops/ms 390.878183 2.089234 3844.011978 3.792751 9.83 Int128Vector.expand ops/ms 394.550044 2.025371 3843.280133 3.528017 9.74 Long128Vector.expand ops/ms 198.366863 0.651726 2423.234639 4.911434 12.21 Short128Vector.expand ops/ms 790.044704 3.339363 5885.595035 1.440598 7.44 Byte64Vector.expand ops/ms 853.479119 7.158898 5942.750116 1.054905 6.96 Double64Vector.expand ops/ms 46.550458 0.079191 46.423053 0.057554 0.99 Float64Vector.expand ops/ms 197.977215 1.156535 2445.010767 1.992358 12.34 Int64Vector.expand ops/ms 198.326857 1.02785 2444.211583 2.5432 12.32 Long64Vector.expand ops/ms 46.526513 0.25779 45.984253 0.566691 0.98 Short64Vector.expand ops/ms 398.649412 1.87764 3837.495773 3.528926 9.62 ``` On Nvidia Grace machine with option `-XX:UseSVE=0`: ``` Benchmark Unit Before Score Error After Score Error Uplift Byte128Vector.expand ops/ms 1802.98702 6.906394 9427.491602 2.067934 5.22 Double128Vector.expand ops/ms 198.498191 0.429071 1190.476326 0.247358 5.99 Float128Vector.expand ops/ms 392.849005 2.034676 2373.195574 2.006566 6.04 Int128Vector.expand ops/ms 395.69179 2.194773 2372.084745 2.058303 5.99 Long128Vector.expand ops/ms 198.191673 1.476362 1189.712301 1.006821 6 Short128Vector.expand ops/ms 795.785831 5.62611 4731.514053 2.365213 5.94 Byte64Vector.expand ops/ms 843.549268 7.174254 5865.556155 37.639415 6.95 Double64Vector.expand ops/ms 45.943599 0.484743 46.529755 0.111551 1.01 Float64Vector.expand ops/ms 193.945993 0.943338 1463.836772 0.618393 7.54 Int64Vector.expand ops/ms 194.168021 0.492286 1473.004575 8.802656 7.58 Long64Vector.expand ops/ms 46.570488 0.076372 46.696353 0.078649 1 Short64Vector.expand ops/ms 387.973334 2.367312 2920.428114 0.863635 7.52 ``` Some JTReg test cases are added for the above changes. And the patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26740/files - new: https://git.openjdk.org/jdk/pull/26740/files/86d011ac..a1777974 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26740&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26740&range=00-01 Stats: 30300 lines in 941 files changed: 17180 ins; 9555 del; 3565 mod Patch: https://git.openjdk.org/jdk/pull/26740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26740/head:pull/26740 PR: https://git.openjdk.org/jdk/pull/26740 From duke at openjdk.org Thu Aug 21 07:00:35 2025 From: duke at openjdk.org (erifan) Date: Thu, 21 Aug 2025 07:00:35 GMT Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI expand operation In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 11:27:59 GMT, Andrew Haley wrote: > The algorithm description here is great. Please paste all of it from "Since there are" to "but with different instructions where appropriate." into this PR, before the vector expand implementation. @theRealAph I have I basically copied the algorithm description to the vector expand implementation. Since there are already two examples in the implementation functions, I didn't copy this example over. Could you review it again? Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26740#issuecomment-3209252469 From mhaessig at openjdk.org Thu Aug 21 07:12:04 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 21 Aug 2025 07:12:04 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v4] In-Reply-To: References: <7SKfaULZBs_ccRipoMMWXKUAASHIhq9um43xaxToBKE=.83db680e-fc44-4be9-8f15-0030e764b4f8@github.com> Message-ID: On Thu, 7 Aug 2025 18:57:14 GMT, Dean Long wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> ASSERT > > Thinking about _timeout_armed a little more, the fact the the signal handler received TIMEOUT_SIGNAL should be enough. The value of _timeout_armed should be redundant, and your assert could be changed to: > > assert(false, "compile task timed out"); > > and _timeout_armed could be removed. It's just an inexact mirror of the timer state. Thank you for your reviews, @dean-long and @chhagedorn! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26023#issuecomment-3209292036 From mhaessig at openjdk.org Thu Aug 21 07:12:07 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 21 Aug 2025 07:12:07 GMT Subject: Integrated: 8308094: Add a compilation timeout flag to catch long running compilations In-Reply-To: References: Message-ID: On Fri, 27 Jun 2025 17:36:54 GMT, Manuel H?ssig wrote: > This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. > > The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. > > Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 on all platforms > - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug > - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) This pull request has now been integrated. Changeset: c74c60fb Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/c74c60fb8b8aa5c917fc4e1c157cc8083f5797a0 Stats: 282 lines in 8 files changed: 279 ins; 0 del; 3 mod 8308094: Add a compilation timeout flag to catch long running compilations Co-authored-by: Dean Long Reviewed-by: dlong, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/26023 From dskantz at openjdk.org Thu Aug 21 07:41:32 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Thu, 21 Aug 2025 07:41:32 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v4] In-Reply-To: References: Message-ID: > This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). > > In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. > > The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. > > [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 > > [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 > > Testing: T1-4. > > Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: compare order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26685/files - new: https://git.openjdk.org/jdk/pull/26685/files/0535d1f0..a3e4b06a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26685&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26685&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26685.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26685/head:pull/26685 PR: https://git.openjdk.org/jdk/pull/26685 From rcastanedalo at openjdk.org Thu Aug 21 07:47:01 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 21 Aug 2025 07:47:01 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v4] In-Reply-To: References: Message-ID: <0ScXc2IoXqF_gsmdI-fV9xULYzljho6fRrGQoJ8w7Xg=.c67372b9-ca12-4ee9-8caa-6a7543500c79@github.com> On Thu, 21 Aug 2025 07:41:32 GMT, Daniel Skantz wrote: >> This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). >> >> In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. >> >> The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. >> >> [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 >> >> [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 >> >> Testing: T1-4. >> >> Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > compare order Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26685#pullrequestreview-3139451097 From dskantz at openjdk.org Thu Aug 21 07:51:56 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Thu, 21 Aug 2025 07:51:56 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v4] In-Reply-To: References: Message-ID: On Mon, 11 Aug 2025 10:30:42 GMT, Galder Zamarre?o wrote: >> Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: >> >> compare order > > test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsMany.java line 28: > >> 26: * @bug 8357105 >> 27: * @summary Test that repeated stacked string concatenations do not >> 28: * consume too many compilation resources. > > Is there a reasonable way to enhance the test to validate excessive resources? I'm not sure if the following example would work, but I'm wondering if there is something that can be measured deterministically. E.g. before with the given test there would be ~N IR nodes produced but now it would be a max of ~M, assuming that M is deterministically smaller than N. What do you think, @galderz ? Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2290189725 From jbhateja at openjdk.org Thu Aug 21 07:54:56 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 21 Aug 2025 07:54:56 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: On Fri, 15 Aug 2025 11:54:59 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 66: > 64: input[i] = (short) i; > 65: } > 66: } How about using Generators for initialization? test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 91: > 89: if (expected != output[i]) { > 90: throw new AssertionError("Result Mismatch!, input = " + input[i] + " constant = " + FP16_IN_RANGE + " actual = " + output[i] + " expected = " + expected); > 91: } Prefer using Verify.check* https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/verify/Verify.java test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 121: > 119: if (expected != output[i]) { > 120: throw new AssertionError("Result Mismatch!, input = " + input[i] + " constant = " + FP16_OUT_OF_RANGE + " actual = " + output[i] + " expected = " + expected); > 121: } As above, please use Verify.check* API. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2290195499 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2290190431 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2290192173 From mdoerr at openjdk.org Thu Aug 21 08:00:54 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 21 Aug 2025 08:00:54 GMT Subject: RFR: 8361536: [s390x] Saving return_pc at wrong offset [v2] In-Reply-To: <7GOHAZFlyiKCrfQPQOp4dugrGW7dEDer4gqR8EhBEwQ=.2962ae08-bab7-4f8d-81b3-b25f2ba668ca@github.com> References: <7GOHAZFlyiKCrfQPQOp4dugrGW7dEDer4gqR8EhBEwQ=.2962ae08-bab7-4f8d-81b3-b25f2ba668ca@github.com> Message-ID: On Thu, 21 Aug 2025 05:05:31 GMT, Amit Kumar wrote: >> Fixes the bug where return pc was stored at a wrong offset, which causes issue with java abi. >> >> Issue appeared in #26004, see the comment: https://github.com/openjdk/jdk/pull/26004#issuecomment-3017928879. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > re-adjust offset, 80 is free so we can start saving from there Marked as reviewed by mdoerr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26209#pullrequestreview-3139493727 From bkilambi at openjdk.org Thu Aug 21 08:14:57 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 21 Aug 2025 08:14:57 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 07:52:03 GMT, Jatin Bhateja wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 66: > >> 64: input[i] = (short) i; >> 65: } >> 66: } > > How about using Generators for initialization? That's what I had in my first patch but changed it to a `for` loop after this comment - https://github.com/openjdk/jdk/pull/26589#discussion_r2260403263. I think it doesn't matter in this case. What's more important is the constant value being passed. Otherwise, the `TestFloat16VectorOperations.java` JTREG test does use generators and test this shape for all values of Float16. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2290263805 From duke at openjdk.org Thu Aug 21 08:16:30 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Thu, 21 Aug 2025 08:16:30 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v5] In-Reply-To: References: Message-ID: <9zpIi45t8VRvl4UQS5ht9xleBMDhqTE6dH6h0dlMfrw=.57c43541-47d5-4486-81a5-c4c59a94a51e@github.com> > - `opto/phasetype.hpp` defines `static const char* phase_names[]` > - `compiler/compilerEvent.cpp` defines `static GrowableArray* phase_names` > > This is not a problem when the two files are compiled as different translation units, but it causes a build failure if any of them is pulled in by a precompiled header: > > > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:59:36: error: redefinition of 'phase_names' with a different type: 'GrowableArray *' vs 'const char *[100]' > 59 | static GrowableArray* phase_names = nullptr; > | ^ > /jdk/src/hotspot/share/opto/phasetype.hpp:147:20: note: previous definition is here > 147 | static const char* phase_names[] = { > | ^ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:67:39: error: member reference base type 'const char *' is not a structure or union > 67 | const u4 nof_entries = phase_names->length(); > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:71:31: error: member reference base type 'const char *' is not a structure or union > 71 | writer.write(phase_names->at(i)); > | ~~~~~~~~~~~^ ~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:77:34: error: member reference base type 'const char *' is not a structure or union > 77 | for (int i = 0; i < phase_names->length(); i++) { > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:78:35: error: member reference base type 'const char *' is not a structure or union > 78 | const char* name = phase_names->at(i); > | ~~~~~~~~~~~^ ~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:91:9: error: comparison of array 'phase_names' equal to a null pointer is always false [-Werror,-Wtautological-pointer-compare] > 91 | if (phase_names == nullptr) { > | ^~~~~~~~~~~ ~~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:92:19: error: array type 'const char *[100]' is not assignable > 92 | phase_names = new (mtInternal) GrowableArray(100, mtCompiler); > | ~~~~~~~~~~~ ^ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:103:24: error: member reference base type 'const char *' is not a structure or union > 103 | index = phase_names->length(); > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:104:16: error: member reference base type 'const char *' is not a structure or union > 104 | phase_names->append(use_strdup ? os::strdup(phase_name) : phase_name); > | ~~~~~~~~~~~^ ~~~~~~ > 9 errors generated. > > > Passes `tier1`. Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: make find_phase a member ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26851/files - new: https://git.openjdk.org/jdk/pull/26851/files/90e9c537..cc7da3ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26851&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26851&range=03-04 Stats: 21 lines in 2 files changed: 10 ins; 7 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26851/head:pull/26851 PR: https://git.openjdk.org/jdk/pull/26851 From duke at openjdk.org Thu Aug 21 08:16:30 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Thu, 21 Aug 2025 08:16:30 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v2] In-Reply-To: References: Message-ID: <8iMwPrN9_jR0gQZmsOe1H_N2GcDRwok0fs3NiS7g67I=.45497904-62ad-496f-8373-8eed1c0d41bc@github.com> On Wed, 20 Aug 2025 19:35:42 GMT, Kim Barrett wrote: >> Underscore: 90e9c537709ad4c384f7efd2ed18c63a4c21b51b > > Don't make it a friend, make it a static member function, and fix the one > caller (later in this file). (The definition of find_phase could be moved to > the new .cpp file.) I think the caller also has an ODR problem, with calls > from different including TUs getting a different file-scoped find_phase. > > It looks like there might be a lot of file-scoped static declarations from > header files in our code. I've made a note to look into this. Sure: cc7da3adeea447ee8c108f0179943de785a6e239 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26851#discussion_r2290268909 From bkilambi at openjdk.org Thu Aug 21 08:19:53 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 21 Aug 2025 08:19:53 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: <3m4I-aY-PTZsQa_SjoRayIbE2FC15xafQi3C8D9XqZs=.60c17714-5ec4-4a3e-96d6-687d81f3b275@github.com> Message-ID: On Thu, 21 Aug 2025 06:01:54 GMT, Emanuel Peter wrote: >> If it is just about checking backend mach nodes with an IR rule, then why not just add the IR rule to the existing test? > > After all it was with that test that we hit the assert, right? > > $ CONF=linux-aarch64-server-fastdebug make images test TEST=compiler/vectorization/TestFloat16VectorOperations.java > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:3756), pid=6237, tid=6259 > # guarantee(false) failed: invalid immediate > > > My opinion: > - If it is exactly the same test -> keep it in the existing one. > - If your tests have a different shape -> make it available to all other platforms, to check at least for correctness > If it is just about checking backend mach nodes with an IR rule, then why not just add the IR rule to the existing test? Well, because I had to do it twice, one for testing a constant value in range and another to test constant value out of range. I felt doing that and adding IR rules for aarch64 specific mach nodes in `compiler/vectorization/TestFloat16VectorOperations.java` might not be ideal. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2290276718 From bkilambi at openjdk.org Thu Aug 21 08:26:57 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 21 Aug 2025 08:26:57 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: <3m4I-aY-PTZsQa_SjoRayIbE2FC15xafQi3C8D9XqZs=.60c17714-5ec4-4a3e-96d6-687d81f3b275@github.com> Message-ID: On Thu, 21 Aug 2025 08:16:46 GMT, Bhavana Kilambi wrote: >> After all it was with that test that we hit the assert, right? >> >> $ CONF=linux-aarch64-server-fastdebug make images test TEST=compiler/vectorization/TestFloat16VectorOperations.java >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:3756), pid=6237, tid=6259 >> # guarantee(false) failed: invalid immediate >> >> >> My opinion: >> - If it is exactly the same test -> keep it in the existing one. >> - If your tests have a different shape -> make it available to all other platforms, to check at least for correctness > >> If it is just about checking backend mach nodes with an IR rule, then why not just add the IR rule to the existing test? > > Well, because I had to do it twice, one for testing a constant value in range and another to test constant value out of range. I felt doing that and adding IR rules for aarch64 specific mach nodes in `compiler/vectorization/TestFloat16VectorOperations.java` might not be ideal. What do you think? `If it is exactly the same test -> keep it in the existing one.` It's exactly the same. I only separated the testcase as an aarch64 specific one as I wanted to have two testcases to test the correct generation of replicate nodes on SVE (one with a valid FP16 constant and another with an invalid one) and I felt having a separate test and not polluting the arch independent `TestFloat16VectorOperations.java` test would be better. Please let me know what you think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2290297641 From jbhateja at openjdk.org Thu Aug 21 08:57:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 21 Aug 2025 08:57:54 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: <0q89hNzCz5i1sosjpLNbxNZE5uyBYGmRZQ5c2c78bl0=.8895c287-d548-4d83-b414-5459bd4826f9@github.com> On Thu, 21 Aug 2025 08:11:59 GMT, Bhavana Kilambi wrote: >> test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 66: >> >>> 64: input[i] = (short) i; >>> 65: } >>> 66: } >> >> How about using Generators for initialization? > > That's what I had in my first patch but changed it to a `for` loop after this comment - https://github.com/openjdk/jdk/pull/26589#discussion_r2260403263. > I think it doesn't matter in this case. What's more important is the constant value being passed. Otherwise, the `TestFloat16VectorOperations.java` JTREG test does use generators and test this shape for all values of Float16. Ok, in general its advisable to use Generators for any initialization, another suggestion, you can also generate constant dynamically through @Stable arrays, here is an example mport jdk.internal.vm.annotation.Stable; import java.util.concurrent.ThreadLocalRandom; public class random_constants { public static final int idx = ThreadLocalRandom.current().nextInt(1034); @Stable public static int [] arr; public static void init() { arr = new int[1024]; for (int i = 0; i < 1024; i++) { arr[i] = ThreadLocalRandom.current().nextInt(); } } public static int yeild_number() { return arr[idx] + 10; } public static void main(String [] args) { int res = 0; init(); for (int i = 0; i < 100000; i++) { res += yeild_number(); } System.out.println("[res] " + res); } } PROMPT>java --add-exports=java.base/jdk.internal.vm.annotation=ALL-UNNAMED -Xbatch -XX:-TieredCompilation -Xbootclasspath/a:. -XX:CompileCommand=PrintIdealPhase,random_constants::yeild_number,BEFORE_MATCHING -cp . random_constants CompileCommand: PrintIdealPhase random_constants.yeild_number const char* PrintIdealPhase = 'BEFORE_MATCHING' AFTER: BEFORE_MATCHING 0 Root === 0 32 [[ 0 1 3 31 ]] inner 3 Start === 3 0 [[ 3 5 6 7 8 9 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address} 5 Parm === 3 [[ 32 ]] Control !jvms: random_constants::yeild_number @ bci:-1 (line 19) 6 Parm === 3 [[ 32 ]] I_O !jvms: random_constants::yeild_number @ bci:-1 (line 19) 7 Parm === 3 [[ 32 ]] Memory Memory: @BotPTR *+bot, idx=Bot; !jvms: random_constants::yeild_number @ bci:-1 (line 19) 8 Parm === 3 [[ 32 ]] FramePtr !jvms: random_constants::yeild_number @ bci:-1 (line 19) 9 Parm === 3 [[ 32 ]] ReturnAdr !jvms: random_constants::yeild_number @ bci:-1 (line 19) 31 ConI === 0 [[ 32 ]] #int:-753356878 32 Return === 5 6 7 8 9 returns 31 [[ 0 ]] [res] -1961428160 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2290383925 From bkilambi at openjdk.org Thu Aug 21 09:00:56 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 21 Aug 2025 09:00:56 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: <0q89hNzCz5i1sosjpLNbxNZE5uyBYGmRZQ5c2c78bl0=.8895c287-d548-4d83-b414-5459bd4826f9@github.com> References: <0q89hNzCz5i1sosjpLNbxNZE5uyBYGmRZQ5c2c78bl0=.8895c287-d548-4d83-b414-5459bd4826f9@github.com> Message-ID: On Thu, 21 Aug 2025 08:55:08 GMT, Jatin Bhateja wrote: >> That's what I had in my first patch but changed it to a `for` loop after this comment - https://github.com/openjdk/jdk/pull/26589#discussion_r2260403263. >> I think it doesn't matter in this case. What's more important is the constant value being passed. Otherwise, the `TestFloat16VectorOperations.java` JTREG test does use generators and test this shape for all values of Float16. > > Ok, in general its advisable to use Generators for any initialization, another suggestion, you can also generate constant dynamically through @Stable arrays, here is an example > > > > import jdk.internal.vm.annotation.Stable; > import java.util.concurrent.ThreadLocalRandom; > > public class random_constants { > public static final int idx = ThreadLocalRandom.current().nextInt(1034); > > @Stable > public static int [] arr; > > public static void init() { > arr = new int[1024]; > for (int i = 0; i < 1024; i++) { > arr[i] = ThreadLocalRandom.current().nextInt(); > } > } > > public static int yeild_number() { > return arr[idx] + 10; > } > > public static void main(String [] args) { > int res = 0; > init(); > for (int i = 0; i < 100000; i++) { > res += yeild_number(); > } > System.out.println("[res] " + res); > } > } > > PROMPT>java --add-exports=java.base/jdk.internal.vm.annotation=ALL-UNNAMED -Xbatch -XX:-TieredCompilation -Xbootclasspath/a:. -XX:CompileCommand=PrintIdealPhase,random_constants::yeild_number,BEFORE_MATCHING -cp . random_constants > CompileCommand: PrintIdealPhase random_constants.yeild_number const char* PrintIdealPhase = 'BEFORE_MATCHING' > AFTER: BEFORE_MATCHING > 0 Root === 0 32 [[ 0 1 3 31 ]] inner > 3 Start === 3 0 [[ 3 5 6 7 8 9 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address} > 5 Parm === 3 [[ 32 ]] Control !jvms: random_constants::yeild_number @ bci:-1 (line 19) > 6 Parm === 3 [[ 32 ]] I_O !jvms: random_constants::yeild_number @ bci:-1 (line 19) > 7 Parm === 3 [[ 32 ]] Memory Memory: @BotPTR *+bot, idx=Bot; !jvms: random_constants::yeild_number @ bci:-1 (line 19) > 8 Parm === 3 [[ 32 ]] FramePtr !jvms: random_constants::yeild_number @ bci:-1 (line 19) > 9 Parm === 3 [[ 32 ]] ReturnAdr !jvms: random_constants::yeild_number @ bci:-1 (line 19) > 31 ConI === 0 [[ 32 ]] #int:-753356878 > 32 Return === 5 6 7 8 9 returns 31 [[ 0 ]] > [res] -1961428160 Thanks for sharing. This looks interesting. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2290393597 From asemenov at openjdk.org Thu Aug 21 09:01:12 2025 From: asemenov at openjdk.org (Artem Semenov) Date: Thu, 21 Aug 2025 09:01:12 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() [v2] In-Reply-To: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: > The defect has been detected and confirmed in the function ArchDesc::declareClasses() located in the file src/hotspot/share/adlc/output_h.cpp with static code analysis. This defect can potentially lead to a null pointer dereference. > > The pointer instr->_matrule is dereferenced in line 1952 without checking for nullptr, although earlier in line 1858 the same pointer is checked for nullptr, which indicates that it can be null. > > According to [this](https://github.com/openjdk/jdk/pull/26002#issuecomment-3023050372) comment, this PR contains fixes for similar cases in other places. Artem Semenov has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/c1/c1_LinearScan.cpp Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> - Update src/hotspot/share/adlc/output_h.cpp Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26798/files - new: https://git.openjdk.org/jdk/pull/26798/files/80777ced..dd21148b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26798&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26798&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26798.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26798/head:pull/26798 PR: https://git.openjdk.org/jdk/pull/26798 From asemenov at openjdk.org Thu Aug 21 09:11:53 2025 From: asemenov at openjdk.org (Artem Semenov) Date: Thu, 21 Aug 2025 09:11:53 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() [v2] In-Reply-To: References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Wed, 20 Aug 2025 12:20:51 GMT, David Holmes wrote: >> Artem Semenov has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/c1/c1_LinearScan.cpp >> >> Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> >> - Update src/hotspot/share/adlc/output_h.cpp >> >> Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > > src/hotspot/share/nmt/mallocSiteTable.cpp line 172: > >> 170: index < pos_idx && head != nullptr; >> 171: index++, head = ((MallocSiteHashtableEntry*)head->next() == nullptr) ? head : >> 172: (MallocSiteHashtableEntry*)head->next()) {} > > This doesn't look right to me. We check `head != nullptr` in the loop condition so we cannot reach the assignment if it is null. A situation is possible where head becomes nullptr when head->next() returns nullptr on the last iteration. Then, after the loop finishes, assert(head != nullptr) will trigger (only in debug mode), and return head->data() will cause a program error ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26798#discussion_r2290418847 From adinn at openjdk.org Thu Aug 21 10:01:59 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 21 Aug 2025 10:01:59 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() [v2] In-Reply-To: References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Thu, 21 Aug 2025 09:08:58 GMT, Artem Semenov wrote: >> src/hotspot/share/nmt/mallocSiteTable.cpp line 172: >> >>> 170: index < pos_idx && head != nullptr; >>> 171: index++, head = ((MallocSiteHashtableEntry*)head->next() == nullptr) ? head : >>> 172: (MallocSiteHashtableEntry*)head->next()) {} >> >> This doesn't look right to me. We check `head != nullptr` in the loop condition so we cannot reach the assignment if it is null. > > A situation is possible where head becomes nullptr when head->next() returns nullptr on the last iteration. Then, after the loop finishes, assert(head != nullptr) will trigger (only in debug mode), and return head->data() will cause a program error Hmm, is it possible? Perhaps you could explain how pos_idx is being used in this loop to guard against that happening and why that does not make this safe? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26798#discussion_r2290543955 From adinn at openjdk.org Thu Aug 21 10:17:56 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 21 Aug 2025 10:17:56 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() [v2] In-Reply-To: References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Thu, 21 Aug 2025 09:01:12 GMT, Artem Semenov wrote: >> The defect has been detected and confirmed in the function ArchDesc::declareClasses() located in the file src/hotspot/share/adlc/output_h.cpp with static code analysis. This defect can potentially lead to a null pointer dereference. >> >> The pointer instr->_matrule is dereferenced in line 1952 without checking for nullptr, although earlier in line 1858 the same pointer is checked for nullptr, which indicates that it can be null. >> >> According to [this](https://github.com/openjdk/jdk/pull/26002#issuecomment-3023050372) comment, this PR contains fixes for similar cases in other places. > > Artem Semenov has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/c1/c1_LinearScan.cpp > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Update src/hotspot/share/adlc/output_h.cpp > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> n.b. Before accepting any of the changes in this PR I'd really like to know whether they have arisen from reports of an actual null pointer dereference or they are simply derived from some theoretical analysis. In the latter case then I think we would need a better explanation of why an error can happen than we have seen so far. Given that requirement I also think each of the changes should be submitted in its own PR with its own justification. We should not modify control flow logic on the nod. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26798#issuecomment-3209906082 From fgao at openjdk.org Thu Aug 21 10:32:57 2025 From: fgao at openjdk.org (Fei Gao) Date: Thu, 21 Aug 2025 10:32:57 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Mon, 13 Jan 2025 08:06:17 GMT, Emanuel Peter wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > Thanks for the updates. I gave it a quick scan and proposed some changes. I can look at it again once you repond to these :) > (we currently have lots of reviews, so I need to do a little round-robin here ? ) Hi @eme64 , I?ve addressed some corner case failures and refined parts of the code in the new commit. Would you like to review it? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3209972783 From asemenov at openjdk.org Thu Aug 21 10:34:03 2025 From: asemenov at openjdk.org (Artem Semenov) Date: Thu, 21 Aug 2025 10:34:03 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() In-Reply-To: References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Wed, 20 Aug 2025 12:29:34 GMT, David Holmes wrote: > I've added some additional mailing lists to ensure better coverage here. > > Also I think you need to update the JBS (and PR) title to reflect the broader scope of the changes. Please provide an example of an updated title? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26798#issuecomment-3209975632 From asemenov at openjdk.org Thu Aug 21 10:34:04 2025 From: asemenov at openjdk.org (Artem Semenov) Date: Thu, 21 Aug 2025 10:34:04 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() [v2] In-Reply-To: References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Wed, 20 Aug 2025 12:22:18 GMT, David Holmes wrote: >> Artem Semenov has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/c1/c1_LinearScan.cpp >> >> Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> >> - Update src/hotspot/share/adlc/output_h.cpp >> >> Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > > src/hotspot/share/opto/vectorIntrinsics.cpp line 1319: > >> 1317: log_if_needed(" ** not supported: arity=%d op=%s vlen=%d etype=%s atype=%s ismask=no", >> 1318: is_scatter, is_scatter ? "scatter" : "gather", >> 1319: num_elem, type2name(elem_bt), type2name(arr_type->elem()->array_element_basic_type())); > > There is a bug here but I'm not sure it is what you think it is. ```addr_type->isa_aryptr();``` might return nullptr, while in ```elem_consistent_with_arr(elem_bt, arr_type, false)```, arr_type is only checked with an assert. Moreover, the presence of a check in the original version indicates that arr_type can be null, and there is no protection against this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26798#discussion_r2290615638 From chagedorn at openjdk.org Thu Aug 21 10:36:51 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 21 Aug 2025 10:36:51 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV [v2] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 06:52:07 GMT, Dingli Zhang wrote: >> Hi, >> Can you help to review this patch? Thanks! >> >> We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. >> The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. >> >> ### Test (fastdebug) >> - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Enable test without RVV and fix in BadIRAnnotationsAfterTestVM Thanks for the update and trying it out! That looks cleaner now. A small improvement suggestion but otherwise, it looks good to me. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java line 1128: > 1126: @FailCount(8) > 1127: @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}, applyIf = {"MaxVectorSize", ">0"}) > 1128: @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}, applyIf = {"MaxVectorSize", ">0"}) // valid Maybe we can add an additional comment: Suggestion: @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}, applyIf = {"MaxVectorSize", "> 0"}) // valid, but only if MaxVectorSize > 0, otherwise, a violation is reported @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}, applyIf = {"MaxVectorSize", "> 0"}) // valid, but only if MaxVectorSize > 0, otherwise, a violation is reported ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26855#pullrequestreview-3140060733 PR Review Comment: https://git.openjdk.org/jdk/pull/26855#discussion_r2290623169 From mdoerr at openjdk.org Thu Aug 21 10:47:59 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 21 Aug 2025 10:47:59 GMT Subject: RFR: 8358696: Assert with extreme values for -XX:BciProfileWidth [v11] In-Reply-To: References: <5TRVeAXUQi6quM-nDWEij_jk6M5K2Vk31RA-Yjd8F2M=.5b63da45-93c3-4251-9e2e-3c64b7953919@github.com> Message-ID: <0frgDbdeT2orwVXd_58_fZHVO7gW_x63WlMUbOhDtlQ=.89c6e5a2-bd63-4453-adc3-870856bea6c3@github.com> On Wed, 20 Aug 2025 23:39:50 GMT, Dean Long wrote: > If we wanted to decrease the code size, we could change the unrolled loop to a real loop. But I think first we should answer the question, why are we profiling "ret" instructions at all? As far as I can tell, the compilers are not using the profiling data for anything, so maybe we could just remove it. Interesting point. I wonder if it makes sense to collect as much profiling data in the interpreter as we currently do. C1 tier 3 can still collect it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26139#issuecomment-3210013114 From dzhang at openjdk.org Thu Aug 21 10:59:19 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 21 Aug 2025 10:59:19 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV [v3] In-Reply-To: References: Message-ID: <1cSrrIo9K9Y9rJj9m8WpJKi3wUyNFJFxCmoyBms9Em0=.c484b2c5-7f07-4cbc-a45f-917491f15e53@github.com> > Hi, > Can you help to review this patch? Thanks! > > We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. > The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. > > ### Test (fastdebug) > - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Add an additional comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26855/files - new: https://git.openjdk.org/jdk/pull/26855/files/d4468d61..0390a6ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26855&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26855&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26855.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26855/head:pull/26855 PR: https://git.openjdk.org/jdk/pull/26855 From dzhang at openjdk.org Thu Aug 21 11:04:56 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 21 Aug 2025 11:04:56 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV [v2] In-Reply-To: References: Message-ID: <2x7CaW6kaW22NXkg7_11AVAi4Zy9OGfZ-OY4zwN-aoU=.5e2ab0eb-ad6a-4fc9-bd03-0029fbd027b6@github.com> On Thu, 21 Aug 2025 10:33:08 GMT, Christian Hagedorn wrote: >> Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Enable test without RVV and fix in BadIRAnnotationsAfterTestVM > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java line 1128: > >> 1126: @FailCount(8) >> 1127: @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}, applyIf = {"MaxVectorSize", ">0"}) >> 1128: @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}, applyIf = {"MaxVectorSize", ">0"}) // valid > > Maybe we can add an additional comment: > Suggestion: > > @IR(counts = {IRNode.LOAD_VECTOR_I, "> 0"}, applyIf = {"MaxVectorSize", "> 0"}) // valid, but only if MaxVectorSize > 0, otherwise, a violation is reported > @IR(counts = {IRNode.LOAD_VECTOR_I, IRNode.VECTOR_SIZE_MAX, "> 0"}, applyIf = {"MaxVectorSize", "> 0"}) // valid, but only if MaxVectorSize > 0, otherwise, a violation is reported Thanks! Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26855#discussion_r2290685442 From asemenov at openjdk.org Thu Aug 21 11:11:56 2025 From: asemenov at openjdk.org (Artem Semenov) Date: Thu, 21 Aug 2025 11:11:56 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() [v2] In-Reply-To: References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Thu, 21 Aug 2025 09:59:01 GMT, Andrew Dinn wrote: >> A situation is possible where head becomes nullptr when head->next() returns nullptr on the last iteration. Then, after the loop finishes, assert(head != nullptr) will trigger (only in debug mode), and return head->data() will cause a program error > > Hmm, is it possible? > > Perhaps you could explain how pos_idx is being used in this loop to guard against that happening and why that does not make this safe? ```head->next()``` returns a pointer to _next without any checks. In turn, the _next pointer is marked as volatile, which means it can be modified at any moment, for example, in another thread. >From this, I conclude that a check in this location is desirable. Moreover, pos_idx is also not being checked. It is quite possible that ```head->next()``` could turn out to be nullptr. But I don?t mind. If you are sure that there can?t be a nullptr in this place, I will withdraw this patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26798#discussion_r2290701959 From lucy at openjdk.org Thu Aug 21 11:43:54 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 21 Aug 2025 11:43:54 GMT Subject: RFR: 8361536: [s390x] Saving return_pc at wrong offset [v2] In-Reply-To: <7GOHAZFlyiKCrfQPQOp4dugrGW7dEDer4gqR8EhBEwQ=.2962ae08-bab7-4f8d-81b3-b25f2ba668ca@github.com> References: <7GOHAZFlyiKCrfQPQOp4dugrGW7dEDer4gqR8EhBEwQ=.2962ae08-bab7-4f8d-81b3-b25f2ba668ca@github.com> Message-ID: On Thu, 21 Aug 2025 05:05:31 GMT, Amit Kumar wrote: >> Fixes the bug where return pc was stored at a wrong offset, which causes issue with java abi. >> >> Issue appeared in #26004, see the comment: https://github.com/openjdk/jdk/pull/26004#issuecomment-3017928879. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > re-adjust offset, 80 is free so we can start saving from there Good to go. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26209#pullrequestreview-3140258773 From kbarrett at openjdk.org Thu Aug 21 12:01:56 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 21 Aug 2025 12:01:56 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v5] In-Reply-To: <9zpIi45t8VRvl4UQS5ht9xleBMDhqTE6dH6h0dlMfrw=.57c43541-47d5-4486-81a5-c4c59a94a51e@github.com> References: <9zpIi45t8VRvl4UQS5ht9xleBMDhqTE6dH6h0dlMfrw=.57c43541-47d5-4486-81a5-c4c59a94a51e@github.com> Message-ID: <9XwThKPO177xs4PD5VxZgtiXTZWAu0nKgSK8lUAWuxk=.8c6d98f1-0418-45db-b1f9-aa8e01d1bb07@github.com> On Thu, 21 Aug 2025 08:16:30 GMT, Francesco Andreuzzi wrote: >> - `opto/phasetype.hpp` defines `static const char* phase_names[]` >> - `compiler/compilerEvent.cpp` defines `static GrowableArray* phase_names` >> >> This is not a problem when the two files are compiled as different translation units, but it causes a build failure if any of them is pulled in by a precompiled header: >> >> >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:59:36: error: redefinition of 'phase_names' with a different type: 'GrowableArray *' vs 'const char *[100]' >> 59 | static GrowableArray* phase_names = nullptr; >> | ^ >> /jdk/src/hotspot/share/opto/phasetype.hpp:147:20: note: previous definition is here >> 147 | static const char* phase_names[] = { >> | ^ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:67:39: error: member reference base type 'const char *' is not a structure or union >> 67 | const u4 nof_entries = phase_names->length(); >> | ~~~~~~~~~~~^ ~~~~~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:71:31: error: member reference base type 'const char *' is not a structure or union >> 71 | writer.write(phase_names->at(i)); >> | ~~~~~~~~~~~^ ~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:77:34: error: member reference base type 'const char *' is not a structure or union >> 77 | for (int i = 0; i < phase_names->length(); i++) { >> | ~~~~~~~~~~~^ ~~~~~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:78:35: error: member reference base type 'const char *' is not a structure or union >> 78 | const char* name = phase_names->at(i); >> | ~~~~~~~~~~~^ ~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:91:9: error: comparison of array 'phase_names' equal to a null pointer is always false [-Werror,-Wtautological-pointer-compare] >> 91 | if (phase_names == nullptr) { >> | ^~~~~~~~~~~ ~~~~~~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:92:19: error: array type 'const char *[100]' is not assignable >> 92 | phase_names = new (mtInternal) GrowableArray(100, mtCompiler); >> | ~~~~~~~~~~~ ^ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:103:24: error: member reference base type 'const char *' is not a structure or union >> 103 | index = phase_names->length(); >> | ~~~~~~~~~~~^ ~~~~~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:104:16: error: member reference base type 'const char *' is not a structure or union >> 104 | phase_names->append(use_strdup ? os::strdup(phase_name) : phase_name... > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > make find_phase a member Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26851#pullrequestreview-3140334072 From epeter at openjdk.org Thu Aug 21 12:03:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Aug 2025 12:03:12 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v18] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <3CfAIHYWebBFc4MjQIsP91nJyTx2oQpHcAkecyz3it8=.2349fa47-7f6b-42cf-b834-db2df1e3bbb5@github.com> On Wed, 20 Aug 2025 12:31:11 GMT, Emanuel Peter wrote: >> TODO work that arose during review process / recent merges with master: >> >> - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. >> - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. >> >> --------------- >> >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > disable flag if not possible I created a stand-alone test to be able to run `perf stat` without the overheads of JMH. The numbers look different, but the conclusion seems to be the same: we have differing `backend_bound` results: 30% vs 36%. And a drastic difference in `tma_retiring` as well. Both tests run quite long, about 30sec. And compilation is done after about 1sec, so we are really measuring the steady-state. // java -XX:CompileCommand=compileonly,Test::copy* -XX:CompileCommand=printcompilation,Test::copy* -Xbatch Test.java public class Test { public static int size = 100_000; public static void main(String[] args) { byte[] a = new byte[size]; for (int i = 0; i < 1000_000; i++) { copy_B(a, a, 0, 0, size); // always alias } } public static void copy_B(byte[] a, byte b[], int aOffset, int bOffset, int size) { for (int i = 0; i < size; i++) { b[i + bOffset] = a[i + aOffset]; } } } Running it with `patch`, which eventually runs with multiversioning in the slow-loop: [empeter at emanuel bin]$ perf stat ../../../linux-x64/jdk/bin/java -XX:CompileCommand=compileonly,Test::copy* -XX:CompileCommand=printcompilation,Test::copy* -Xbatch Test.java CompileCommand: compileonly Test.copy* bool compileonly = true CompileCommand: PrintCompilation Test.copy* bool PrintCompilation = true 2172 98 % b 3 Test::copy_B @ 3 (29 bytes) 2172 99 b 3 Test::copy_B (29 bytes) 2173 100 % b 4 Test::copy_B @ 3 (29 bytes) 2198 101 b 4 Test::copy_B (29 bytes) 2212 102 b 4 Test::copy_B (29 bytes) Performance counter stats for '../../../linux-x64/jdk/bin/java -XX:CompileCommand=compileonly,Test::copy* -XX:CompileCommand=printcompilation,Test::copy* -Xbatch Test.java': 35,151.89 msec task-clock:u # 1.001 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 8,692 page-faults:u # 247.270 /sec 86,730,942,915 cycles:u # 2.467 GHz 225,939,652,810 instructions:u # 2.61 insn per cycle 2,931,222,952 branches:u # 83.387 M/sec 55,264,982 branch-misses:u # 1.89% of all branches TopdownL1 # 36.0 % tma_backend_bound # 14.2 % tma_bad_speculation # 3.5 % tma_frontend_bound # 46.3 % tma_retiring 35.111092609 seconds time elapsed 34.819260000 seconds user 0.257300000 seconds sys Running with `not_profitable`, which compiles only with a single scalar loop: [empeter at emanuel bin]$ perf stat ../../../linux-x64/jdk/bin/java -XX:CompileCommand=compileonly,Test::copy* -XX:CompileCommand=printcompilation,Test::copy* -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:AutoVectorizationOverrideProfitability=0 Test.java CompileCommand: compileonly Test.copy* bool compileonly = true CompileCommand: PrintCompilation Test.copy* bool PrintCompilation = true 2196 98 % b 3 Test::copy_B @ 3 (29 bytes) 2196 99 b 3 Test::copy_B (29 bytes) 2197 100 % b 4 Test::copy_B @ 3 (29 bytes) 2210 101 b 4 Test::copy_B (29 bytes) Performance counter stats for '../../../linux-x64/jdk/bin/java -XX:CompileCommand=compileonly,Test::copy* -XX:CompileCommand=printcompilation,Test::copy* -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:AutoVectorizationOverrideProfitability=0 Test.java': 31,205.82 msec task-clock:u # 1.001 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 8,029 page-faults:u # 257.292 /sec 76,952,997,639 cycles:u # 2.466 GHz 228,849,251,864 instructions:u # 2.97 insn per cycle 2,894,918,583 branches:u # 92.769 M/sec 55,022,648 branch-misses:u # 1.90% of all branches TopdownL1 # 30.6 % tma_backend_bound # 13.1 % tma_bad_speculation # 3.0 % tma_frontend_bound # 53.4 % tma_retiring 31.161118421 seconds time elapsed 30.853187000 seconds user 0.303616000 seconds sys I also ran an experiment where I artificially disabled vectorization in the fast-loop for multiversioning. Just in case that somehow had an influence on the slow-loop.... but that does not change the 10% difference. Also changing `size=1000_000` and adjusting the repetitions to `100_000` does not change the outcome (maybe lowers the branch misprediction slightly). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3210290629 PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3210294043 From adinn at openjdk.org Thu Aug 21 12:11:53 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 21 Aug 2025 12:11:53 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() [v2] In-Reply-To: References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Thu, 21 Aug 2025 11:09:12 GMT, Artem Semenov wrote: > Moreover, pos_idx is also not being checked I don't know what you mean by this comment. `pos_idx` is being checked in the loop test before the call to `head->next()` in that same test. The important question you need to address is why and what that check guarantees. I say you need to address it because you are the one claiming that there is a possible nullptr dereference here without any evidence that it has occurred in practice. If that is based on a correct analysis of the code then you need to explain how we can arrive at a situtation where we hit a null pointer that takes into account the logic of the loop test. So far you have not done so. n.b. I am not claiming there is no possibility of a nullptr dereference here (although I can form my own opinion). I'm asking you to tell me why I should take your claim that it is possible seriously. Your answers so far are not convincing me that you have understood how this code works. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26798#discussion_r2290852355 From epeter at openjdk.org Thu Aug 21 12:27:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Aug 2025 12:27:12 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v18] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Wed, 20 Aug 2025 12:31:11 GMT, Emanuel Peter wrote: >> TODO work that arose during review process / recent merges with master: >> >> - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. >> - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. >> >> --------------- >> >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > disable flag if not possible I'm going to run the benchmarks on our benchmarking servers now, just to see if this can be reproduced across platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3210385597 From mhaessig at openjdk.org Thu Aug 21 12:45:19 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 21 Aug 2025 12:45:19 GMT Subject: RFR: 8365909: [REDO] Add a compilation timeout flag to catch long running compilations Message-ID: This PR adds a timeout for compilation tasks based on timer signals on Linux debug builds. This PR is a redo of #25872 with fixes for the failing test. Testing: - [ ] Github Actions - [x] tier1,tier2 plus internal testing on all Oracle supproted platforms - [x] tier3,tier4 on linux-x64-debug - [x] tier1,tier2,tier3,tier4 on linux-x64-debug with `-XX:CompileTaskTimeout=60000` ------------- Commit messages: - Fix test - Fix timeout test - Print with %zd - Print timeout properly - Use static buffer for method name - missed a dash - Address Christian's comments - Fix format string - Add test - Remove superfluous NOT_PRODUCT - ... and 11 more: https://git.openjdk.org/jdk/compare/5febc4e3...f86361c8 Changes: https://git.openjdk.org/jdk/pull/26882/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26882&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365909 Stats: 281 lines in 8 files changed: 278 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26882.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26882/head:pull/26882 PR: https://git.openjdk.org/jdk/pull/26882 From duke at openjdk.org Thu Aug 21 12:47:52 2025 From: duke at openjdk.org (duke) Date: Thu, 21 Aug 2025 12:47:52 GMT Subject: RFR: 8365829: Multiple definitions of static 'phase_names' [v5] In-Reply-To: <9zpIi45t8VRvl4UQS5ht9xleBMDhqTE6dH6h0dlMfrw=.57c43541-47d5-4486-81a5-c4c59a94a51e@github.com> References: <9zpIi45t8VRvl4UQS5ht9xleBMDhqTE6dH6h0dlMfrw=.57c43541-47d5-4486-81a5-c4c59a94a51e@github.com> Message-ID: On Thu, 21 Aug 2025 08:16:30 GMT, Francesco Andreuzzi wrote: >> - `opto/phasetype.hpp` defines `static const char* phase_names[]` >> - `compiler/compilerEvent.cpp` defines `static GrowableArray* phase_names` >> >> This is not a problem when the two files are compiled as different translation units, but it causes a build failure if any of them is pulled in by a precompiled header: >> >> >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:59:36: error: redefinition of 'phase_names' with a different type: 'GrowableArray *' vs 'const char *[100]' >> 59 | static GrowableArray* phase_names = nullptr; >> | ^ >> /jdk/src/hotspot/share/opto/phasetype.hpp:147:20: note: previous definition is here >> 147 | static const char* phase_names[] = { >> | ^ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:67:39: error: member reference base type 'const char *' is not a structure or union >> 67 | const u4 nof_entries = phase_names->length(); >> | ~~~~~~~~~~~^ ~~~~~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:71:31: error: member reference base type 'const char *' is not a structure or union >> 71 | writer.write(phase_names->at(i)); >> | ~~~~~~~~~~~^ ~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:77:34: error: member reference base type 'const char *' is not a structure or union >> 77 | for (int i = 0; i < phase_names->length(); i++) { >> | ~~~~~~~~~~~^ ~~~~~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:78:35: error: member reference base type 'const char *' is not a structure or union >> 78 | const char* name = phase_names->at(i); >> | ~~~~~~~~~~~^ ~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:91:9: error: comparison of array 'phase_names' equal to a null pointer is always false [-Werror,-Wtautological-pointer-compare] >> 91 | if (phase_names == nullptr) { >> | ^~~~~~~~~~~ ~~~~~~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:92:19: error: array type 'const char *[100]' is not assignable >> 92 | phase_names = new (mtInternal) GrowableArray(100, mtCompiler); >> | ~~~~~~~~~~~ ^ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:103:24: error: member reference base type 'const char *' is not a structure or union >> 103 | index = phase_names->length(); >> | ~~~~~~~~~~~^ ~~~~~~ >> /jdk/src/hotspot/share/compiler/compilerEvent.cpp:104:16: error: member reference base type 'const char *' is not a structure or union >> 104 | phase_names->append(use_strdup ? os::strdup(phase_name) : phase_name... > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > make find_phase a member @fandreuz Your change (at version cc7da3adeea447ee8c108f0179943de785a6e239) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26851#issuecomment-3210453038 From asemenov at openjdk.org Thu Aug 21 13:21:02 2025 From: asemenov at openjdk.org (Artem Semenov) Date: Thu, 21 Aug 2025 13:21:02 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() [v2] In-Reply-To: References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Thu, 21 Aug 2025 12:09:38 GMT, Andrew Dinn wrote: >> ```head->next()``` returns a pointer to _next without any checks. >> >> In turn, the _next pointer is marked as volatile, which means it can be modified at any moment, for example, in another thread. >> >> From this, I conclude that a check in this location is desirable. Moreover, pos_idx is also not being checked. It is quite possible that ```head->next()``` could turn out to be nullptr. >> >> But I don?t mind. If you are sure that there can?t be a nullptr in this place, I will withdraw this patch. > >> Moreover, pos_idx is also not being checked > > I don't know what you mean by this comment. `pos_idx` is being checked in the loop test before the call to `head->next()` in that same test. > > The important question you need to address is why and what that check guarantees. I say you need to address it because you are the one claiming that there is a possible nullptr dereference here without any evidence that it has occurred in practice. If that is based on a correct analysis of the code then you need to explain how we can arrive at a situtation where we hit a null pointer that takes into account the logic of the loop test. So far you have not done so. > > n.b. I am not claiming there is no possibility of a nullptr dereference here (although I can form my own opinion). I'm asking you to tell me why I should take your claim that it is possible seriously. Your answers so far are not convincing me that you have understood how this code works. pos_idx receives its value when calling a certain function pos_idx_from_marker(marker), and there is no check before the loop to ensure that it is within the bounds of the _table size. I mentioned above that I am not insisting on this particular patch. This issue was detected by a static analyzer and confirmed by a specialist from another organization. After that, based on my limited knowledge, I considered it confirmed? If you have any refutation, please share your thoughts. In that case, I will revert this patch and mark the trigger as ?NO FIX REQUIRED?. As far as I have checked, there are no checks anywhere in the calls to this function to compare the marker with the table or any other entities in any form. I certainly do not claim to understand this code as well as you or any other member of the hotspot team. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26798#discussion_r2291041769 From eosterlund at openjdk.org Thu Aug 21 13:23:14 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 21 Aug 2025 13:23:14 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v42] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 01:17:27 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Lock nmethod::relocate behind experimental flag > - Use CompiledICLocker instead of CompiledIC_lock > - Fix spacing > - Update NMethod.java with immutable data changes > - Rename method to nm > - Add assert before freeing immutable data > - Reorder is_relocatable checks > - Require caller to hold locks > - Revert is_always_within_branch_range changes > - ... and 97 more: https://git.openjdk.org/jdk/compare/9593730a...24c35689 Okay. Thanks for cleaning up this PR. So if you think we should do this... what is the proposal here? I mean, I get that code relocation can help with iTLB pressure. But how do I use this? I only see the new relocation function accessible through some whitebox API? Maybe I'm missing something? What's the story here? If we check this in now, it seems to me that it won't really help anyone reduce iTLB pressure. There is just a bunch of code for reducing iTLB pressure which isn't used. Also, do you have any numbers showing if iTLB pressure improved? Or performance improved? Or in general that anything improved? I'm guessing so but I'd like to see some data. src/hotspot/share/code/nmethod.hpp line 172: > 170: friend class DeoptimizationScope; > 171: > 172: #define ImmutableDataReferencesCounterSize ((int)sizeof(int)) Seems like this is needed mostly because there is immutable data shared between the old and new location, warranting a reference counter to keep track of when neither one of them needs the data any longer. And the reference counter is embedded in one of the byte blob sections of the nmethod, where it needs extra alignment. Did I get that right? If so, here are some thoughts: 1. This seems like a memory optimization which is only useful when we bloat memory. We want to free up the old nmethod because it will likely become dead weight very soon. In fact, it might make sense to make it not entrant instead of not used to that end, so the GC understands this should be nuked. If we actually free up the old nmethod, there is not much of a sharing opportunity here. 2. Even if we want this micro optimization, is there any reason it wouldn't just be a normal field so we can get rid of this special handling in the byte blobs? src/hotspot/share/gc/z/zUnload.cpp line 103: > 101: > 102: virtual bool is_safe(nmethod* nm) { > 103: if (SafepointSynchronize::is_at_safepoint() || nm->is_unloading() || nm->is_not_installed()) { Why is this change needed? src/hotspot/share/prims/whitebox.cpp line 1659: > 1657: ResourceMark rm(THREAD); > 1658: CHECK_JNI_EXCEPTION(env); > 1659: nmethod* code = (nmethod*) addr; Hmm this might corrupt the code heap and cause crashes. The nmethod could have been freed and had something random else allocated across the same memory, and then casted nmethod even though it is some random instructions there now. Can't really do that. src/hotspot/share/runtime/globals.hpp line 1567: > 1565: range(0, 100) \ > 1566: \ > 1567: product(bool, NMethodRelocation, false, EXPERIMENTAL, \ Why is this only available behind an experimental JVM flag? ------------- PR Review: https://git.openjdk.org/jdk/pull/23573#pullrequestreview-3140379328 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2290859039 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2290975974 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2290901579 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2291028471 From mli at openjdk.org Thu Aug 21 13:23:31 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 21 Aug 2025 13:23:31 GMT Subject: RFR: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way Message-ID: Hi, Can you help to review this patch? This is a follow-up of https://github.com/openjdk/jdk/pull/26838, fixes the vector version in a similar way. Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/26883/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26883&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365772 Stats: 36 lines in 3 files changed: 19 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/26883.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26883/head:pull/26883 PR: https://git.openjdk.org/jdk/pull/26883 From chagedorn at openjdk.org Thu Aug 21 14:00:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 21 Aug 2025 14:00:56 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV [v3] In-Reply-To: <1cSrrIo9K9Y9rJj9m8WpJKi3wUyNFJFxCmoyBms9Em0=.c484b2c5-7f07-4cbc-a45f-917491f15e53@github.com> References: <1cSrrIo9K9Y9rJj9m8WpJKi3wUyNFJFxCmoyBms9Em0=.c484b2c5-7f07-4cbc-a45f-917491f15e53@github.com> Message-ID: On Thu, 21 Aug 2025 10:59:19 GMT, Dingli Zhang wrote: >> Hi, >> Can you help to review this patch? Thanks! >> >> We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. >> The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. >> >> ### Test (fastdebug) >> - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add an additional comment Looks good, thanks for the updates! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26855#pullrequestreview-3140823242 From adinn at openjdk.org Thu Aug 21 14:34:53 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 21 Aug 2025 14:34:53 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() [v2] In-Reply-To: References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Thu, 21 Aug 2025 13:18:26 GMT, Artem Semenov wrote: >>> Moreover, pos_idx is also not being checked >> >> I don't know what you mean by this comment. `pos_idx` is being checked in the loop test before the call to `head->next()` in that same test. >> >> The important question you need to address is why and what that check guarantees. I say you need to address it because you are the one claiming that there is a possible nullptr dereference here without any evidence that it has occurred in practice. If that is based on a correct analysis of the code then you need to explain how we can arrive at a situtation where we hit a null pointer that takes into account the logic of the loop test. So far you have not done so. >> >> n.b. I am not claiming there is no possibility of a nullptr dereference here (although I can form my own opinion). I'm asking you to tell me why I should take your claim that it is possible seriously. Your answers so far are not convincing me that you have understood how this code works. > > pos_idx receives its value when calling a certain function pos_idx_from_marker(marker), and there is no check before the loop to ensure that it is within the bounds of the _table size. > > I mentioned above that I am not insisting on this particular patch. This issue was detected by a static analyzer. After that, based on my limited knowledge, I considered it confirmed? If you have any refutation, please share your thoughts. In that case, I will revert this patch and mark the trigger as ?NO FIX REQUIRED?. > > As far as I have checked, there are no checks anywhere in the calls to this function to compare the marker with the table or any other entities in any form. > > I certainly do not claim to understand this code as well as you or any other member of the hotspot team. Well, this leads right to the root of the problem I have with this report. As you say, pos_idx does indeed come out of a marker object. It took me abut a minute to identify that this marker object is created in the function that sits right above the one your code assistant flagged as problematic -- even though I am not at all familiar with this code. It looks clear to me that, given the right call sequence for calls that create a marker and then consume it here, the check on pos_idx will ensure that we don't drop off the end of the list with a null pointer. So, it looks very liek this code has been designed so that the presence of a marker with a suitable pos_idx is intended to ensure this loop terminates before that happens. I am sure someone in this project knows whether that is the case but it is not you or your coding assistant. I'm not suggesting that that calling sequence is actually right and that the check for pos_idx will definitely avoid dropping off the end. Indeed, I would welcome a bug that proved it to be wrong. However, what is clear that both you and your coding assistant have failed to appreciate how some relatively obvious parts of this design actually operate. That renders your (or your tool's) analysis a shallow and unhelpful distraction; using it as an excuse to raise a purported 'issue' in the absence of any evidence of an actual issue is very much a waste of time for this project's reviewers. Your error is compounded by the fact that you (or more likely your coding assistant) are suggesting changes which, because they are not founded in a correct understanding of the design, could potentially lead to worse outcomes than the speculative nullptr dereference they are intended to remedy -- as I explained when discussing your change to the control flow logic in the ALDC code. So, not only is this report unhelpful it is potentially harmful. Ultimately the takeaway here is that the OpenJDK bug system is not here to report, review and add patches to remedy issues that you or your code assistant tool invents on the basis of misinformed assumptions. It is here to report, review and add patches to remedy issues that can be shown to actually affect the correct operation of the JVM and JDK,either by a reproducible test or by well-reasoned argument. So, please do not continue to spam the project with bug reports like this simply because a potentially bogus patch will improve your experience with what is clearly a decidedly fallible tool. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26798#discussion_r2291265263 From epeter at openjdk.org Thu Aug 21 14:38:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Aug 2025 14:38:54 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV [v3] In-Reply-To: <1cSrrIo9K9Y9rJj9m8WpJKi3wUyNFJFxCmoyBms9Em0=.c484b2c5-7f07-4cbc-a45f-917491f15e53@github.com> References: <1cSrrIo9K9Y9rJj9m8WpJKi3wUyNFJFxCmoyBms9Em0=.c484b2c5-7f07-4cbc-a45f-917491f15e53@github.com> Message-ID: On Thu, 21 Aug 2025 10:59:19 GMT, Dingli Zhang wrote: >> Hi, >> Can you help to review this patch? Thanks! >> >> We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. >> The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. >> >> ### Test (fastdebug) >> - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add an additional comment Looks good now, thanks @chhagedorn for the idea! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26855#pullrequestreview-3140987142 From rehn at openjdk.org Thu Aug 21 14:59:15 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 21 Aug 2025 14:59:15 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v42] In-Reply-To: References: Message-ID: On Tue, 12 Aug 2025 01:17:27 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Lock nmethod::relocate behind experimental flag > - Use CompiledICLocker instead of CompiledIC_lock > - Fix spacing > - Update NMethod.java with immutable data changes > - Rename method to nm > - Add assert before freeing immutable data > - Reorder is_relocatable checks > - Require caller to hold locks > - Revert is_always_within_branch_range changes > - ... and 97 more: https://git.openjdk.org/jdk/compare/9593730a...24c35689 Hey! @fisk > Also, do you have any numbers showing if iTLB pressure improved? Or performance improved? Or in general that anything improved? I'm guessing so but I'd like to see some data. The issue is that some of the major arm manufacturers seems to have missed appendix C in Intel opt manual - "OPTIMIZATION WITH LARGE CODE PAGES". E.g. running renaissance dotty on a G3 I saw 37% front-ends stall (G2 28%, they made significant improvement to backend on G3, presumably not front-end hence more stalling). By using less itbl entries we can significant increase ipc on these CPUs. Simple testing with some eariler version of this got ~10% reduction in frontend stalls (take that number with a grain of salt). Now if this is correct approach or not, that's is still unclear to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3210955729 From iveresov at openjdk.org Thu Aug 21 15:01:54 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 21 Aug 2025 15:01:54 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v2] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <871DbNrXSrdwXxEynOq_fZjvQXMP30abzW17OxgIX4E=.30cb9ca0-914a-472e-aa38-34cb6c034e0e@github.com> Message-ID: <8m6eeYwRwgfqafcvuhnXo19A-HaYMBM3eS4l7cVgu6w=.00285c38-24d1-4d07-9bcf-2024cb342b74@github.com> On Thu, 21 Aug 2025 03:15:18 GMT, David Holmes wrote: >> There is an `Atomic::sub()` in `dec_init_deps_left()`. > > Okay - not obvious we actually require acquire semantics when reading a simple count, but I'm not sure what the count may imply. But please consider renaming the method. Let me think some more about this one. May we don't need it indeed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2291340597 From kvn at openjdk.org Thu Aug 21 15:17:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 21 Aug 2025 15:17:54 GMT Subject: RFR: 8365891: failed: Completed task should not be in the queue In-Reply-To: References: Message-ID: <-gyyEPuObLEEDjI3AG0g1GlvmDaPgKc-kMjW8a1B9P8=.802770f5-b6ff-47a4-8466-73f022caa12a@github.com> On Thu, 21 Aug 2025 01:48:55 GMT, Vladimir Kozlov wrote: > Added missing `task->set_next(nullptr)` for "stale" tasks. > > Testing: tier1-3,xcomp,stress Thank you, Dean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26872#issuecomment-3211058473 From jkarthikeyan at openjdk.org Thu Aug 21 15:21:48 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 21 Aug 2025 15:21:48 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: Message-ID: > Hi all, > This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Update comment for constraint casts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26827/files - new: https://git.openjdk.org/jdk/pull/26827/files/d5c4dda2..d6c81a9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=00-01 Stats: 7 lines in 1 file changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26827/head:pull/26827 PR: https://git.openjdk.org/jdk/pull/26827 From jkarthikeyan at openjdk.org Thu Aug 21 15:21:49 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 21 Aug 2025 15:21:49 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 06:19:03 GMT, Emanuel Peter wrote: >> I see. Ok. Can you add a comment to the code for that? >> Because imagine we come along later and actually implement a backend vectorized version of CastII (no-op?). Maybe because we implement if-conversion. Then it would be nice to know if this was just a "to be on the safe side" check, or if it would run into issues when removed. > > Because the current comment says "should not truncate". That sounds more strong than "to be on the safe side". I think this is fair, I've pushed a commit that changes the comment wording. Let me know what you think! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2291394740 From eosterlund at openjdk.org Thu Aug 21 15:58:07 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 21 Aug 2025 15:58:07 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v42] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 14:56:30 GMT, Robbin Ehn wrote: > By using less itbl entries we can significant increase ipc on these CPUs. > > Simple testing with some eariler version of this got ~10% reduction in frontend stalls (take that number with a grain of salt). > > Now if this is correct approach or not, that's is still unclear to me. Okay that sounds quite promising. So what is the driver for this relocation in the JVM, which makes sure hot nmethods get moved together? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3211194188 From mhaessig at openjdk.org Thu Aug 21 16:33:13 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 21 Aug 2025 16:33:13 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v5] In-Reply-To: References: Message-ID: > This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus some internal testing on Oracle supported platforms Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: - Fix test - Better counting in tests - post processing of flags and documentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26762/files - new: https://git.openjdk.org/jdk/pull/26762/files/f59e9d9d..273e5f64 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=03-04 Stats: 65 lines in 2 files changed: 41 ins; 8 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/26762.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26762/head:pull/26762 PR: https://git.openjdk.org/jdk/pull/26762 From mhaessig at openjdk.org Thu Aug 21 16:33:14 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 21 Aug 2025 16:33:14 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v4] In-Reply-To: References: <7QzCmv17rsIwVX0a4C_wTq4jhx6cob4juy454yuOof0=.fa045ee0-eaa4-43e1-853b-93880a0d44b3@github.com> Message-ID: On Thu, 21 Aug 2025 06:15:46 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Make the test work > > Changes requested by epeter (Reviewer). @eme64, I completely revamped counting of failures to regex matching and made multiple and no flags in one string work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26762#issuecomment-3211302621 From kvn at openjdk.org Thu Aug 21 16:40:04 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 21 Aug 2025 16:40:04 GMT Subject: RFR: 8365891: failed: Completed task should not be in the queue In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 01:48:55 GMT, Vladimir Kozlov wrote: > Added missing `task->set_next(nullptr)` for "stale" tasks. > > Testing: tier1-3,xcomp,stress Testing is green ------------- PR Comment: https://git.openjdk.org/jdk/pull/26872#issuecomment-3211333715 From kvn at openjdk.org Thu Aug 21 16:40:04 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 21 Aug 2025 16:40:04 GMT Subject: Integrated: 8365891: failed: Completed task should not be in the queue In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 01:48:55 GMT, Vladimir Kozlov wrote: > Added missing `task->set_next(nullptr)` for "stale" tasks. > > Testing: tier1-3,xcomp,stress This pull request has now been integrated. Changeset: d7572468 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/d75724682390efa7cb63ae973fd9c504f7f64852 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8365891: failed: Completed task should not be in the queue Reviewed-by: dlong ------------- PR: https://git.openjdk.org/jdk/pull/26872 From kvn at openjdk.org Thu Aug 21 17:02:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 21 Aug 2025 17:02:55 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v4] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: <4P2y8gjeUOn0hXpJ3cJumGLqrLLx9c0FsAAmIZZDTBA=.23c103b1-c9dd-45a0-9289-99f4910c5bcf@github.com> On Thu, 21 Aug 2025 03:00:11 GMT, Igor Veresov wrote: >> This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > More cleanup src/hotspot/share/compiler/compilationPolicy.cpp line 143: > 141: void CompilationPolicy::wait_replay_training_at_init(JavaThread* THREAD) { > 142: MonitorLocker locker(THREAD, TrainingReplayQueue_lock); > 143: while (!_training_replay_queue.is_empty_unlocked() || _training_replay_queue.is_processing_unlocked()) { Is this queue used in all phases? src/hotspot/share/oops/trainingData.cpp line 86: > 84: > 85: void TrainingData::verify() { > 86: if (TrainingData::have_data() && !TrainingData::assembling_data()) { Why assembly phase excluded? src/hotspot/share/oops/trainingData.cpp line 105: > 103: }); > 104: } > 105: if (TrainingData::need_data()) { I assume this is "training" run. Right? src/hotspot/share/oops/trainingData.cpp line 113: > 111: } else if (td->is_MethodTrainingData()) { > 112: MethodTrainingData* mtd = td->as_MethodTrainingData(); > 113: mtd->verify(false); Why it is `false` here? Comment please. src/hotspot/share/runtime/java.cpp line 522: > 520: if (AOTVerifyTrainingData) { > 521: EXCEPTION_MARK; > 522: CompilationPolicy::wait_replay_training_at_init(THREAD); It is called on VM exit but name is `_at_init`. May be drop that from name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2291649772 PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2291646234 PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2291647689 PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2291642776 PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2291620321 From kvn at openjdk.org Thu Aug 21 17:13:06 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 21 Aug 2025 17:13:06 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v42] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 15:54:48 GMT, Erik ?sterlund wrote: > So what is the driver for this relocation in the JVM, which makes sure hot nmethods get moved together? @fisk, next RFE [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205) will "drive" nmethod relocation based on their hotness. It is similar AFAIK to what you implemented back in Leyden repo to create list of hot nmethods to cache. We can a sampling thread which uses the thread-local handshake framework. An example of such a thread is the Sweeper: https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/runtime/sweeper.hpp which was used to detect active nmethods. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3211445467 From iveresov at openjdk.org Thu Aug 21 17:17:52 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 21 Aug 2025 17:17:52 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v4] In-Reply-To: <4P2y8gjeUOn0hXpJ3cJumGLqrLLx9c0FsAAmIZZDTBA=.23c103b1-c9dd-45a0-9289-99f4910c5bcf@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <4P2y8gjeUOn0hXpJ3cJumGLqrLLx9c0FsAAmIZZDTBA=.23c103b1-c9dd-45a0-9289-99f4910c5bcf@github.com> Message-ID: <_zLG3LFeo4k6m13wHgVdQH60i6NUftGGqRdPLgZTWAo=.6d12f710-1119-4cab-a3e8-13108ef2e0eb@github.com> On Thu, 21 Aug 2025 16:56:27 GMT, Vladimir Kozlov wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> More cleanup > > src/hotspot/share/oops/trainingData.cpp line 113: > >> 111: } else if (td->is_MethodTrainingData()) { >> 112: MethodTrainingData* mtd = td->as_MethodTrainingData(); >> 113: mtd->verify(false); > > Why it is `false` here? Comment please. I will add a comment. But this is the training run, so we don't have the dep counters setup yet. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2291688602 From iveresov at openjdk.org Thu Aug 21 17:23:52 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 21 Aug 2025 17:23:52 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v4] In-Reply-To: <4P2y8gjeUOn0hXpJ3cJumGLqrLLx9c0FsAAmIZZDTBA=.23c103b1-c9dd-45a0-9289-99f4910c5bcf@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <4P2y8gjeUOn0hXpJ3cJumGLqrLLx9c0FsAAmIZZDTBA=.23c103b1-c9dd-45a0-9289-99f4910c5bcf@github.com> Message-ID: On Thu, 21 Aug 2025 16:58:08 GMT, Vladimir Kozlov wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> More cleanup > > src/hotspot/share/oops/trainingData.cpp line 86: > >> 84: >> 85: void TrainingData::verify() { >> 86: if (TrainingData::have_data() && !TrainingData::assembling_data()) { > > Why assembly phase excluded? We don't hookup the dep tracking machinery for some of the classes just yet. So the dep counter verification can fail. > src/hotspot/share/oops/trainingData.cpp line 105: > >> 103: }); >> 104: } >> 105: if (TrainingData::need_data()) { > > I assume this is "training" run. Right? Yes ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2291704095 PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2291704534 From iveresov at openjdk.org Thu Aug 21 17:28:55 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 21 Aug 2025 17:28:55 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v4] In-Reply-To: <4P2y8gjeUOn0hXpJ3cJumGLqrLLx9c0FsAAmIZZDTBA=.23c103b1-c9dd-45a0-9289-99f4910c5bcf@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <4P2y8gjeUOn0hXpJ3cJumGLqrLLx9c0FsAAmIZZDTBA=.23c103b1-c9dd-45a0-9289-99f4910c5bcf@github.com> Message-ID: <_5Fp5DsLJuOTEBharjk4bHiYXYNLw6waTRLVKcfZHKY=.43c6dd0d-7f58-4b46-bc78-342ad7996219@github.com> On Thu, 21 Aug 2025 16:59:54 GMT, Vladimir Kozlov wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> More cleanup > > src/hotspot/share/compiler/compilationPolicy.cpp line 143: > >> 141: void CompilationPolicy::wait_replay_training_at_init(JavaThread* THREAD) { >> 142: MonitorLocker locker(THREAD, TrainingReplayQueue_lock); >> 143: while (!_training_replay_queue.is_empty_unlocked() || _training_replay_queue.is_processing_unlocked()) { > > Is this queue used in all phases? For now just in production and assembly. During training the queue is there but is empty. But with iterative training it's going to be present during training. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2291714279 From iveresov at openjdk.org Thu Aug 21 17:32:11 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 21 Aug 2025 17:32:11 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v5] In-Reply-To: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: <9WmlnzWGeCLGA4pUqmBw9jLDNXsOhA9JVfN7axi2ifM=.33a55a65-ab03-4f8d-8bdd-3037223fa007@github.com> > This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: More cleanup and renames ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26866/files - new: https://git.openjdk.org/jdk/pull/26866/files/289fb74c..edc8591d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26866&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26866&range=03-04 Stats: 30 lines in 4 files changed: 2 ins; 2 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/26866.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26866/head:pull/26866 PR: https://git.openjdk.org/jdk/pull/26866 From iveresov at openjdk.org Thu Aug 21 17:47:52 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 21 Aug 2025 17:47:52 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v4] In-Reply-To: <4P2y8gjeUOn0hXpJ3cJumGLqrLLx9c0FsAAmIZZDTBA=.23c103b1-c9dd-45a0-9289-99f4910c5bcf@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <4P2y8gjeUOn0hXpJ3cJumGLqrLLx9c0FsAAmIZZDTBA=.23c103b1-c9dd-45a0-9289-99f4910c5bcf@github.com> Message-ID: On Thu, 21 Aug 2025 16:47:02 GMT, Vladimir Kozlov wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> More cleanup > > src/hotspot/share/runtime/java.cpp line 522: > >> 520: if (AOTVerifyTrainingData) { >> 521: EXCEPTION_MARK; >> 522: CompilationPolicy::wait_replay_training_at_init(THREAD); > > It is called on VM exit but name is `_at_init`. May be drop that from name. That's because `at_init` comes from `class initialization` events servicing. Those are enqueued after the class initialization is done. So, yes, it's at the shutdown, but the processing of class initializations is still happening. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2291750973 From eosterlund at openjdk.org Thu Aug 21 18:09:09 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 21 Aug 2025 18:09:09 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v42] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 17:10:01 GMT, Vladimir Kozlov wrote: > @fisk, next RFE [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205) will "drive" nmethod relocation based on their hotness. It is similar AFAIK to what you implemented back in Leyden repo to create list of hot nmethods to cache. > > > > ``` > > We can a sampling thread which uses the thread-local handshake framework. An example of such a thread is the Sweeper: https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/runtime/sweeper.hpp which was used to detect active nmethods. > > ``` Ah okay so the driver is split out. Sounds good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3211595394 From kvn at openjdk.org Thu Aug 21 18:32:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 21 Aug 2025 18:32:51 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v4] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <4P2y8gjeUOn0hXpJ3cJumGLqrLLx9c0FsAAmIZZDTBA=.23c103b1-c9dd-45a0-9289-99f4910c5bcf@github.com> Message-ID: <122qZDRJNr0wRNgp-o6FJlmScBbsD0XWNfAw3XLaGjA=.a7b94197-9b8a-4bbc-984e-f26d1ad2e974@github.com> On Thu, 21 Aug 2025 17:45:02 GMT, Igor Veresov wrote: >> src/hotspot/share/runtime/java.cpp line 522: >> >>> 520: if (AOTVerifyTrainingData) { >>> 521: EXCEPTION_MARK; >>> 522: CompilationPolicy::wait_replay_training_at_init(THREAD); >> >> It is called on VM exit but name is `_at_init`. May be drop that from name. > > That's because `at_init` comes from `class initialization` events servicing. Those are enqueued after the class initialization is done. So, yes, it's at the shutdown, but the processing of class initializations is still happening. This is low level knowledge nobody except you know (now I know). For other people who looks on this, it is confusing. `_at_init` gives nothing to understand what code does. I think `wait_replay_training_to_finish()` may be better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2291838685 From kvn at openjdk.org Thu Aug 21 18:38:05 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 21 Aug 2025 18:38:05 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v18] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Wed, 20 Aug 2025 12:31:11 GMT, Emanuel Peter wrote: >> TODO work that arose during review process / recent merges with master: >> >> - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. >> - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. >> >> --------------- >> >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > disable flag if not possible It would be nice to have code profiling tool which could show which part in code for these two cases is hot. Instead of guessing based on whole system behaviors. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3211684157 From vlivanov at openjdk.org Thu Aug 21 18:42:55 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 21 Aug 2025 18:42:55 GMT Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee In-Reply-To: References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> Message-ID: On Thu, 21 Aug 2025 00:22:53 GMT, Dean Long wrote: >> # Issue >> The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one. >> In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan. >> >> # Cause >> The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading. >> >> For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method. >> What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`. >> >> More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same. >> >> # Fix >> >> This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. So, we change the assert to **check for invalid dependencies if the current callee and the previous one don't match**. >> >> # Testing >> >> This issue is very very, very intermittent and d... > > src/hotspot/share/opto/callGenerator.cpp line 487: > >> 485: "repeated inline attempt with different callee"); >> 486: } >> 487: #endif > > I'm wondering if there might be other reasons that the callee might change, like JVMTI class redefinition. Also, it sounds like the CHA case is really rare, and we check dependencies at the end anyway, so the easiest fix for class redefinition and CHA would be to ignore the new callee and keep the old one here. I second that. And it aligns with our effort to make CI queries report stable results. (That's what I proposed to Damon privately: "Another alternative is to cache and reuse cg->callee_method() when it becomes non-null. And turn repeated CHA requests (Compile::optimize_inlining) into verification logic.") ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26441#discussion_r2291857763 From duke at openjdk.org Thu Aug 21 20:52:13 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 21 Aug 2025 20:52:13 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v42] In-Reply-To: References: Message-ID: <3aT1VVRygeMzaWwdCDyaknlAWuIxddJgJ-yJbJnWfgY=.c5880021-ecae-48dd-8edb-d6021099d53f@github.com> On Thu, 21 Aug 2025 12:12:00 GMT, Erik ?sterlund wrote: > In fact, it might make sense to make it not entrant instead of not used to that end, so the GC understands this should be nuked. Could you explain what you mean by this? Not used is not entrant. > If we actually free up the old nmethod, there is not much of a sharing opportunity here. The goal is to eliminate an unneeded copy. What is being done is less "sharing" and more of a hand off of ownership from the old to the new as the GC will clean up the old eventually. When that happens the old should know not to free the immutable data memory. > Even if we want this micro optimization, is there any reason it wouldn't just be a normal field so we can get rid of this special handling in the byte blobs? That is the motivation for [JDK-8358213](https://bugs.openjdk.org/browse/JDK-8358213). I suppose we could have a field in `nmethod` to know if it should free its immutable data instead of storing a reference counter in the immutable data itself. I don't have a strong argument for either approach. > src/hotspot/share/runtime/globals.hpp line 1567: > >> 1565: range(0, 100) \ >> 1566: \ >> 1567: product(bool, NMethodRelocation, false, EXPERIMENTAL, \ > > Why is this only available behind an experimental JVM flag? This was requested by @vnkozlov so others know `nmethod::relocate` is still experimental ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2292103173 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2292106125 From dlong at openjdk.org Thu Aug 21 20:58:51 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 21 Aug 2025 20:58:51 GMT Subject: RFR: 8365909: [REDO] Add a compilation timeout flag to catch long running compilations In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 11:56:17 GMT, Manuel H?ssig wrote: > This PR adds a timeout for compilation tasks based on timer signals on Linux debug builds. > > This PR is a redo of #25872 with fixes for the failing test. > > Testing: > - [ ] Github Actions > - [x] tier1,tier2 plus internal testing on all Oracle supproted platforms > - [x] tier3,tier4 on linux-x64-debug > - [x] tier1,tier2,tier3,tier4 on linux-x64-debug with `-XX:CompileTaskTimeout=60000` Please explain the test fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26882#issuecomment-3212047079 From duke at openjdk.org Thu Aug 21 21:09:12 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 21 Aug 2025 21:09:12 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v42] In-Reply-To: References: Message-ID: <_AfGSkUrSqQ0-XyMhloQxY6OcsluebHf8VVx2cJ7t6M=.df547d95-5a8f-4f4b-975d-515fa9a9f615@github.com> On Thu, 21 Aug 2025 12:54:56 GMT, Erik ?sterlund wrote: >> Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: >> >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Lock nmethod::relocate behind experimental flag >> - Use CompiledICLocker instead of CompiledIC_lock >> - Fix spacing >> - Update NMethod.java with immutable data changes >> - Rename method to nm >> - Add assert before freeing immutable data >> - Reorder is_relocatable checks >> - Require caller to hold locks >> - Revert is_always_within_branch_range changes >> - ... and 97 more: https://git.openjdk.org/jdk/compare/9593730a...24c35689 > > src/hotspot/share/gc/z/zUnload.cpp line 103: > >> 101: >> 102: virtual bool is_safe(nmethod* nm) { >> 103: if (SafepointSynchronize::is_at_safepoint() || nm->is_unloading() || nm->is_not_installed()) { > > Why is this change needed? We clear inline caches on the new nmethod (`nmethod::clear_inline_caches()`). `CompiledICProtectionBehaviour::is_safe` is used as a check to verify that the caches are safe to clear. If the nmethod is not installed nothing should be using the caches so it should be safe to clear ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2292134348 From vlivanov at openjdk.org Fri Aug 22 00:54:36 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 22 Aug 2025 00:54:36 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v4] In-Reply-To: References: Message-ID: <30EeF0wNWwSnIoU2nAm7-_1X_nqI1qxDgpSetu221n0=.b6de9244-7f69-444d-879a-180ab0edfdbf@github.com> > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into 8290892.rf - renaming - review feedback - Merge branch 'master' into 8290892.rf - 8290892: C2: Intrinsify Reference.reachabilityFence ------------- Changes: https://git.openjdk.org/jdk/pull/25315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=03 Stats: 1161 lines in 35 files changed: 1107 ins; 19 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From fyang at openjdk.org Fri Aug 22 01:19:59 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 22 Aug 2025 01:19:59 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV [v3] In-Reply-To: <1cSrrIo9K9Y9rJj9m8WpJKi3wUyNFJFxCmoyBms9Em0=.c484b2c5-7f07-4cbc-a45f-917491f15e53@github.com> References: <1cSrrIo9K9Y9rJj9m8WpJKi3wUyNFJFxCmoyBms9Em0=.c484b2c5-7f07-4cbc-a45f-917491f15e53@github.com> Message-ID: <6fuEx_CyHTCW5um-sR3QmqKftxRwjHHjwbAxK-yg-Hw=.69e80e1b-2703-4a44-b8cd-bde28bb0158e@github.com> On Thu, 21 Aug 2025 10:59:19 GMT, Dingli Zhang wrote: >> Hi, >> Can you help to review this patch? Thanks! >> >> We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. >> The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. >> >> ### Test (fastdebug) >> - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add an additional comment LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26855#pullrequestreview-3142716345 From dzhang at openjdk.org Fri Aug 22 01:20:00 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Fri, 22 Aug 2025 01:20:00 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV [v3] In-Reply-To: <1cSrrIo9K9Y9rJj9m8WpJKi3wUyNFJFxCmoyBms9Em0=.c484b2c5-7f07-4cbc-a45f-917491f15e53@github.com> References: <1cSrrIo9K9Y9rJj9m8WpJKi3wUyNFJFxCmoyBms9Em0=.c484b2c5-7f07-4cbc-a45f-917491f15e53@github.com> Message-ID: On Thu, 21 Aug 2025 10:59:19 GMT, Dingli Zhang wrote: >> Hi, >> Can you help to review this patch? Thanks! >> >> We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. >> The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. >> >> ### Test (fastdebug) >> - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add an additional comment Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26855#issuecomment-3212639174 From duke at openjdk.org Fri Aug 22 01:20:00 2025 From: duke at openjdk.org (duke) Date: Fri, 22 Aug 2025 01:20:00 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV [v3] In-Reply-To: <1cSrrIo9K9Y9rJj9m8WpJKi3wUyNFJFxCmoyBms9Em0=.c484b2c5-7f07-4cbc-a45f-917491f15e53@github.com> References: <1cSrrIo9K9Y9rJj9m8WpJKi3wUyNFJFxCmoyBms9Em0=.c484b2c5-7f07-4cbc-a45f-917491f15e53@github.com> Message-ID: On Thu, 21 Aug 2025 10:59:19 GMT, Dingli Zhang wrote: >> Hi, >> Can you help to review this patch? Thanks! >> >> We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. >> The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. >> >> ### Test (fastdebug) >> - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add an additional comment @DingliZhang Your change (at version 0390a6ef4392f82bf8c9dd9a35b0ca1df9f264c9) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26855#issuecomment-3212643668 From fjiang at openjdk.org Fri Aug 22 01:45:59 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 22 Aug 2025 01:45:59 GMT Subject: RFR: 8365844: RISC-V: TestBadFormat.java fails when running without RVV [v3] In-Reply-To: <1cSrrIo9K9Y9rJj9m8WpJKi3wUyNFJFxCmoyBms9Em0=.c484b2c5-7f07-4cbc-a45f-917491f15e53@github.com> References: <1cSrrIo9K9Y9rJj9m8WpJKi3wUyNFJFxCmoyBms9Em0=.c484b2c5-7f07-4cbc-a45f-917491f15e53@github.com> Message-ID: On Thu, 21 Aug 2025 10:59:19 GMT, Dingli Zhang wrote: >> Hi, >> Can you help to review this patch? Thanks! >> >> We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. >> The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. >> >> ### Test (fastdebug) >> - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add an additional comment Marked as reviewed by fjiang (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26855#pullrequestreview-3142819636 From dzhang at openjdk.org Fri Aug 22 01:46:00 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Fri, 22 Aug 2025 01:46:00 GMT Subject: Integrated: 8365844: RISC-V: TestBadFormat.java fails when running without RVV In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 07:56:19 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > We noticed that testlibrary_tests/ir_framework/tests/TestBadFormat.java fails when running tier4 tests on p550. > The reason for the error is that the Vector test related to badVectorNodeSize requires RVV on riscv, otherwise the expected passing case will fail and cannot match FailCount. > > ### Test (fastdebug) > - [x] Run testlibrary_tests/ir_framework/tests/TestBadFormat.java on k1/k230/sg2042 This pull request has now been integrated. Changeset: 584137cf Author: Dingli Zhang Committer: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/584137cf968bdfd4fdb88b5bb210bbbfa5f2d537 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8365844: RISC-V: TestBadFormat.java fails when running without RVV Reviewed-by: fjiang, chagedorn, epeter, fyang ------------- PR: https://git.openjdk.org/jdk/pull/26855 From iveresov at openjdk.org Fri Aug 22 03:21:36 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 22 Aug 2025 03:21:36 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v6] In-Reply-To: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: > This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: More renames ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26866/files - new: https://git.openjdk.org/jdk/pull/26866/files/edc8591d..f7d6a4e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26866&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26866&range=04-05 Stats: 15 lines in 5 files changed: 2 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/26866.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26866/head:pull/26866 PR: https://git.openjdk.org/jdk/pull/26866 From iveresov at openjdk.org Fri Aug 22 03:21:36 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 22 Aug 2025 03:21:36 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v2] In-Reply-To: <8m6eeYwRwgfqafcvuhnXo19A-HaYMBM3eS4l7cVgu6w=.00285c38-24d1-4d07-9bcf-2024cb342b74@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <871DbNrXSrdwXxEynOq_fZjvQXMP30abzW17OxgIX4E=.30cb9ca0-914a-472e-aa38-34cb6c034e0e@github.com> <8m6eeYwRwgfqafcvuhnXo19A-HaYMBM3eS4l7cVgu6w=.00285c38-24d1-4d07-9bcf-2024cb342b74@github.com> Message-ID: On Thu, 21 Aug 2025 14:58:56 GMT, Igor Veresov wrote: >> Okay - not obvious we actually require acquire semantics when reading a simple count, but I'm not sure what the count may imply. But please consider renaming the method. > > Let me think some more about this one. May we don't need it indeed. Yeah, it's needed. I did the renaming you requested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2292597504 From iveresov at openjdk.org Fri Aug 22 03:21:37 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 22 Aug 2025 03:21:37 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v4] In-Reply-To: <122qZDRJNr0wRNgp-o6FJlmScBbsD0XWNfAw3XLaGjA=.a7b94197-9b8a-4bbc-984e-f26d1ad2e974@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <4P2y8gjeUOn0hXpJ3cJumGLqrLLx9c0FsAAmIZZDTBA=.23c103b1-c9dd-45a0-9289-99f4910c5bcf@github.com> <122qZDRJNr0wRNgp-o6FJlmScBbsD0XWNfAw3XLaGjA=.a7b94197-9b8a-4bbc-984e-f26d1ad2e974@github.com> Message-ID: <1nfqdRTlejsQoXqtGRmaLj7I1wGZ4Saoh-loyUW2WTo=.1f7a13ea-5c6d-4cf3-815c-8700651b22be@github.com> On Thu, 21 Aug 2025 18:30:10 GMT, Vladimir Kozlov wrote: >> That's because `at_init` comes from `class initialization` events servicing. Those are enqueued after the class initialization is done. So, yes, it's at the shutdown, but the processing of class initializations is still happening. > > This is low level knowledge nobody except you know (now I know). For other people who looks on this, it is confusing. `_at_init` gives nothing to understand what code does. I think `wait_replay_training_to_finish()` may be better. ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2292597060 From amitkumar at openjdk.org Fri Aug 22 03:45:57 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 22 Aug 2025 03:45:57 GMT Subject: RFR: 8361536: [s390x] Saving return_pc at wrong offset [v2] In-Reply-To: <7GOHAZFlyiKCrfQPQOp4dugrGW7dEDer4gqR8EhBEwQ=.2962ae08-bab7-4f8d-81b3-b25f2ba668ca@github.com> References: <7GOHAZFlyiKCrfQPQOp4dugrGW7dEDer4gqR8EhBEwQ=.2962ae08-bab7-4f8d-81b3-b25f2ba668ca@github.com> Message-ID: On Thu, 21 Aug 2025 05:05:31 GMT, Amit Kumar wrote: >> Fixes the bug where return pc was stored at a wrong offset, which causes issue with java abi. >> >> Issue appeared in #26004, see the comment: https://github.com/openjdk/jdk/pull/26004#issuecomment-3017928879. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > re-adjust offset, 80 is free so we can start saving from there GHA failures are infra issue. Thanks for approval and reviews Lutz, Martin. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26209#issuecomment-3212928199 From amitkumar at openjdk.org Fri Aug 22 03:45:58 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 22 Aug 2025 03:45:58 GMT Subject: Integrated: 8361536: [s390x] Saving return_pc at wrong offset In-Reply-To: References: Message-ID: <45yC6fQAzEqoUJs266Eb5xFs_-Uvgw25Hmxu-JfoJsY=.f81c9a91-29bc-4db0-af9f-93dfc7978e4e@github.com> On Wed, 9 Jul 2025 05:24:38 GMT, Amit Kumar wrote: > Fixes the bug where return pc was stored at a wrong offset, which causes issue with java abi. > > Issue appeared in #26004, see the comment: https://github.com/openjdk/jdk/pull/26004#issuecomment-3017928879. This pull request has now been integrated. Changeset: 558d0639 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/558d06399c7a13b247ee3d0f36f4fe6118004c55 Stats: 20 lines in 1 file changed: 2 ins; 0 del; 18 mod 8361536: [s390x] Saving return_pc at wrong offset Reviewed-by: lucy, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/26209 From epeter at openjdk.org Fri Aug 22 06:22:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 06:22:57 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 15:21:48 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Update comment for constraint casts @jaskarth Thanks for the fix, it looks good to me now :) I'm just running some internal testing now, please ping me after the weekend :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26827#issuecomment-3213181813 From qamai at openjdk.org Fri Aug 22 06:33:51 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 22 Aug 2025 06:33:51 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 15:18:15 GMT, Jasmine Karthikeyan wrote: >> Because the current comment says "should not truncate". That sounds more strong than "to be on the safe side". > > I think this is fair, I've pushed a commit that changes the comment wording. Let me know what you think! We have `CastVV` which should be the packed version of `CastII`. The thing that is I think the most difficult is to properly wire it. In the simplest situation of all elements in the pack having the same control input can be easily handled, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2292827860 From epeter at openjdk.org Fri Aug 22 06:57:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 06:57:58 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v5] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 16:33:13 GMT, Manuel H?ssig wrote: >> This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. >> >> Testing: >> - [x] Github Actions >> - [x] tier1,tier2 plus some internal testing on Oracle supported platforms > > Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: > > - Fix test > - Better counting in tests > - post processing of flags and documentation Looks much better already! I now took a closer look at the implementation :) test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 363: > 361: final public TestFramework addCrossProductScenarios(Set... flagSets) { > 362: TestFormat.checkAndReport(flagSets != null && Arrays.stream(flagSets).noneMatch(Objects::isNull), > 363: "Flags must not be null"); Suggestion: TestFormat.checkAndReport(flagSets != null && Arrays.stream(flagSets).noneMatch(Objects::isNull), "Flags must not be null"); Optional: indentation of args test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 363: > 361: final public TestFramework addCrossProductScenarios(Set... flagSets) { > 362: TestFormat.checkAndReport(flagSets != null && Arrays.stream(flagSets).noneMatch(Objects::isNull), > 363: "Flags must not be null"); What about an empty `flagSets`? Is it allowed? Do we have a test for it? test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 367: > 365: if (this.scenarioIndices != null && !this.scenarioIndices.isEmpty()) { > 366: initIdx = this.scenarioIndices.stream().max(Comparator.comparingInt(Integer::intValue)).get() + 1; > 367: } Nit: you are writing code here that allows previous scenarios, but there is no example / test for that below ;) test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 374: > 372: idx.getAndIncrement(), > 373: flags.stream() // Process flags > 374: .filter(s -> !s.isEmpty()) // Remove empty flags Why do you need that? Ah, these are empty strings, right? Suggestion: .filter(s -> !s.isEmpty()) // Remove empty string flags test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 375: > 373: flags.stream() // Process flags > 374: .filter(s -> !s.isEmpty()) // Remove empty flags > 375: .map(s -> Set.of(s.split("[ ]"))) // Split muliple flags in the same string into separate strings What happens if I enter `"flag_one flag_two"` with two spaces in the middle? Do I then get an empty string in the middle again? If so: move the empty string filter down. test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 387: > 385: if (idx == sets.length) { > 386: Set empty = Set.of(); > 387: return Set.of(empty).stream(); Suggestion: return Stream.of(Set.of()); Would this work? Or at least this: Suggestion: Set empty = Set.of(); return Stream.of(empty); test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 394: > 392: Set newSet = new HashSet<>(set); > 393: newSet.add(setElement); > 394: return newSet; Not super performant, as it creates a new HashSet at every turn... but oh well we are not making this public anyway ;) test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 96: > 94: TestFramework t5 = new TestFramework(); > 95: t5.addCrossProductScenarios(Set.of("", "-XX:TLABRefillWasteFraction=51", "-XX:TLABRefillWasteFraction=53"), > 96: Set.of("-XX:+UseNewCode", "-XX:+UseNewCode2")); Now looking at the implementation of `addCrossProductScenarios`: what does it do when it is called without arguments/empty args array? Can you also add a test for that? ------------- PR Review: https://git.openjdk.org/jdk/pull/26762#pullrequestreview-3143243970 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2292817705 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2292827988 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2292869974 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2292855516 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2292857113 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2292836362 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2292852276 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2292825338 From epeter at openjdk.org Fri Aug 22 07:01:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 07:01:05 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v18] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Thu, 21 Aug 2025 18:35:40 GMT, Vladimir Kozlov wrote: > It would be nice to have code profiling tool which could show which part in code for these two cases is hot. Instead of guessing based on whole system behaviors. That is what I already did, but it did not help much - the hot code looks basically identical :/ https://github.com/openjdk/jdk/pull/24278#issuecomment-3201092650 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3213267059 From epeter at openjdk.org Fri Aug 22 07:05:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 07:05:05 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v18] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <3V0iAoPoTB5TbsCO0QxOrPwEIbyYa9qYQZs9nCTmNj4=.fe50204a-a5be-4375-86bd-60078336b36a@github.com> On Wed, 20 Aug 2025 12:31:11 GMT, Emanuel Peter wrote: >> TODO work that arose during review process / recent merges with master: >> >> - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. >> - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. >> >> --------------- >> >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > disable flag if not possible Here the logs: [empeter at emanuel jdk-fork6]$ perf stat ./build/linux-x64/jdk/bin/java -Djava.library.path=/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/native -jar /home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/benchmarks.jar "VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias" -prof perfasm WARNING: A terminally deprecated method in sun.misc.Unsafe has been called WARNING: sun.misc.Unsafe::objectFieldOffset has been called by org.openjdk.jmh.util.Utils (file:/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/benchmarks.jar) WARNING: Please consider reporting this to the maintainers of class org.openjdk.jmh.util.Utils WARNING: sun.misc.Unsafe::objectFieldOffset will be removed in a future release # JMH version: 1.37 # VM version: JDK 26-internal, Java HotSpot(TM) 64-Bit Server VM, 26-internal-2025-08-19-0806546.empeter... # VM invoker: /home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/jdk/bin/java # VM options: -XX:+UseSuperWord -XX:+UnlockDiagnosticVMOptions -XX:AutoVectorizationOverrideProfitability=0 # Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable) # Warmup: 1 iterations, 1 s each # Measurement: 10 iterations, 1 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: org.openjdk.bench.vm.compiler.VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias # Parameters: (SIZE = 10000, seed = 0) # Run progress: 0.00% complete, ETA 00:00:11 # Fork: 1 of 1 # Preparing profilers: LinuxPerfAsmProfiler # Profilers consume stdout and stderr from target VM, use -v EXTRA to copy to console # Warmup Iteration 1: 3101.582 ns/op Iteration 1: 2876.229 ns/op Iteration 2: 2858.107 ns/op Iteration 3: 2837.087 ns/op Iteration 4: 2860.013 ns/op Iteration 5: 2851.886 ns/op Iteration 6: 2872.007 ns/op Iteration 7: 2863.599 ns/op Iteration 8: 2842.069 ns/op Iteration 9: 2841.341 ns/op Iteration 10: 2844.861 ns/op # Processing profiler results: LinuxPerfAsmProfiler Result "org.openjdk.bench.vm.compiler.VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias": 2854.720 ?(99.9%) 20.377 ns/op [Average] (min, avg, max) = (2837.087, 2854.720, 2876.229), stdev = 13.478 CI (99.9%): [2834.343, 2875.097] (assumes normal distribution) Secondary result "org.openjdk.bench.vm.compiler.VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias:asm": PrintAssembly processed: 326211 total address lines. Perf output processed (skipped 10.085 seconds): Column 1: cycles (10297 events) Hottest code regions (>10.00% "cycles" events): Event counts are percents of total event count. ....[Hottest Region 1].............................................................................. c2, level 4, org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 3, compile id 1261 0.03% 0x00007f28e8bef76b: movslq %r11d,%r11 0x00007f28e8bef76e: cmp %r11,%r10 0x00007f28e8bef771: jae 0x00007f28e8befb18 0.02% 0x00007f28e8bef777: movsbl 0x10(%rdx,%rbp,1),%r11d 0x00007f28e8bef77d: mov %r11b,0x10(%rcx,%rax,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 22 (line 92) 0x00007f28e8bef782: add $0xffffffffffffffc1,%rbx 0x00007f28e8bef786: mov $0xffffffff80000000,%r10 0x00007f28e8bef78d: cmp $0xffffffff80000000,%rbx 0x00007f28e8bef794: cmovl %r10,%rbx 0.01% 0x00007f28e8bef798: mov %ebx,%r13d 0.01% 0x00007f28e8bef79b: mov $0x1,%esi 0.01% 0x00007f28e8bef7a0: cmp $0x1,%r13d 0x00007f28e8bef7a4: jle 0x00007f28e8befae6 ;*goto {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91) ? 0x00007f28e8bef7aa: jmpq 0x00007f28e8befab8 ? 0x00007f28e8bef7af: nop 0.07% ?? 0x00007f28e8bef7b0: vmovd %xmm0,%r8d ;*aload_2 {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 10 (line 92) ?? 0x00007f28e8bef7b5: vmovd %r8d,%xmm0 0.05% ?? 0x00007f28e8bef7ba: add %esi,%r8d ?? 0x00007f28e8bef7bd: vmovd %xmm2,%r10d 0.95% ?? 0x00007f28e8bef7c2: add %esi,%r10d ?? 0x00007f28e8bef7c5: movslq %r8d,%r11 1.17% ?? 0x00007f28e8bef7c8: movslq %r10d,%r8 0.01% ?? 0x00007f28e8bef7cb: movslq %esi,%r10 0.27% ?? 0x00007f28e8bef7ce: lea (%rax,%r10,1),%r9 ?? 0x00007f28e8bef7d2: lea (%r10,%rbp,1),%rbx 0.32% ?? 0x00007f28e8bef7d6: movsbl 0x10(%rdx,%rbx,1),%r10d 0.92% ?? 0x00007f28e8bef7dc: mov %r10b,0x10(%rcx,%r9,1) 1.59% ?? 0x00007f28e8bef7e1: movsbl 0x11(%rdx,%rbx,1),%r10d 0.17% ?? 0x00007f28e8bef7e7: mov %r10b,0x11(%rcx,%r9,1) 1.20% ?? 0x00007f28e8bef7ec: movsbl 0x12(%rdx,%rbx,1),%r10d 0.20% ?? 0x00007f28e8bef7f2: mov %r10b,0x12(%rcx,%r9,1) 0.83% ?? 0x00007f28e8bef7f7: movsbl 0x13(%rdx,%r11,1),%r10d 0.28% ?? 0x00007f28e8bef7fd: mov %r10b,0x13(%rcx,%r8,1) 0.44% ?? 0x00007f28e8bef802: movsbl 0x14(%rdx,%r11,1),%r10d 0.23% ?? 0x00007f28e8bef808: mov %r10b,0x14(%rcx,%r8,1) ; {other} 1.08% ?? 0x00007f28e8bef80d: movsbl 0x15(%rdx,%r11,1),%r10d 0.43% ?? 0x00007f28e8bef813: mov %r10b,0x15(%rcx,%r8,1) 1.12% ?? 0x00007f28e8bef818: movsbl 0x16(%rdx,%r11,1),%r10d 0.51% ?? 0x00007f28e8bef81e: mov %r10b,0x16(%rcx,%r8,1) 1.45% ?? 0x00007f28e8bef823: movsbl 0x17(%rdx,%r11,1),%r10d 0.30% ?? 0x00007f28e8bef829: mov %r10b,0x17(%rcx,%r8,1) 1.05% ?? 0x00007f28e8bef82e: movsbl 0x18(%rdx,%r11,1),%r10d 0.19% ?? 0x00007f28e8bef834: mov %r10b,0x18(%rcx,%r8,1) 1.08% ?? 0x00007f28e8bef839: movsbl 0x19(%rdx,%r11,1),%r10d 0.21% ?? 0x00007f28e8bef83f: mov %r10b,0x19(%rcx,%r8,1) 0.95% ?? 0x00007f28e8bef844: movsbl 0x1a(%rdx,%r11,1),%r10d 0.16% ?? 0x00007f28e8bef84a: mov %r10b,0x1a(%rcx,%r8,1) 0.94% ?? 0x00007f28e8bef84f: movsbl 0x1b(%rdx,%r11,1),%r10d 0.47% ?? 0x00007f28e8bef855: mov %r10b,0x1b(%rcx,%r8,1) 0.96% ?? 0x00007f28e8bef85a: movsbl 0x1c(%rdx,%r11,1),%r10d 0.55% ?? 0x00007f28e8bef860: mov %r10b,0x1c(%rcx,%r8,1) 1.34% ?? 0x00007f28e8bef865: movsbl 0x1d(%rdx,%r11,1),%r10d 0.33% ?? 0x00007f28e8bef86b: mov %r10b,0x1d(%rcx,%r8,1) 1.15% ?? 0x00007f28e8bef870: movsbl 0x1e(%rdx,%r11,1),%r10d 0.25% ?? 0x00007f28e8bef876: mov %r10b,0x1e(%rcx,%r8,1) 1.11% ?? 0x00007f28e8bef87b: movsbl 0x1f(%rdx,%r11,1),%r10d 0.45% ?? 0x00007f28e8bef881: mov %r10b,0x1f(%rcx,%r8,1) 1.15% ?? 0x00007f28e8bef886: movsbl 0x20(%rdx,%r11,1),%r10d 0.19% ?? 0x00007f28e8bef88c: mov %r10b,0x20(%rcx,%r8,1) 1.10% ?? 0x00007f28e8bef891: movsbl 0x21(%rdx,%r11,1),%r10d 0.15% ?? 0x00007f28e8bef897: mov %r10b,0x21(%rcx,%r8,1) 0.97% ?? 0x00007f28e8bef89c: movsbl 0x22(%rdx,%r11,1),%r10d 0.58% ?? 0x00007f28e8bef8a2: mov %r10b,0x22(%rcx,%r8,1) 1.19% ?? 0x00007f28e8bef8a7: movsbl 0x23(%rdx,%r11,1),%r10d 0.38% ?? 0x00007f28e8bef8ad: mov %r10b,0x23(%rcx,%r8,1) 1.12% ?? 0x00007f28e8bef8b2: movsbl 0x24(%rdx,%r11,1),%r10d 0.22% ?? 0x00007f28e8bef8b8: mov %r10b,0x24(%rcx,%r8,1) 1.20% ?? 0x00007f28e8bef8bd: movsbl 0x25(%rdx,%r11,1),%r10d 0.20% ?? 0x00007f28e8bef8c3: mov %r10b,0x25(%rcx,%r8,1) 1.15% ?? 0x00007f28e8bef8c8: movsbl 0x26(%rdx,%r11,1),%r10d 0.10% ?? 0x00007f28e8bef8ce: mov %r10b,0x26(%rcx,%r8,1) 1.02% ?? 0x00007f28e8bef8d3: movsbl 0x27(%rdx,%r11,1),%r10d 0.14% ?? 0x00007f28e8bef8d9: mov %r10b,0x27(%rcx,%r8,1) 1.10% ?? 0x00007f28e8bef8de: movsbl 0x28(%rdx,%r11,1),%r10d 0.38% ?? 0x00007f28e8bef8e4: mov %r10b,0x28(%rcx,%r8,1) 1.02% ?? 0x00007f28e8bef8e9: movsbl 0x29(%rdx,%r11,1),%r10d 0.33% ?? 0x00007f28e8bef8ef: mov %r10b,0x29(%rcx,%r8,1) 1.03% ?? 0x00007f28e8bef8f4: movsbl 0x2a(%rdx,%r11,1),%r10d 0.33% ?? 0x00007f28e8bef8fa: mov %r10b,0x2a(%rcx,%r8,1) 1.08% ?? 0x00007f28e8bef8ff: movsbl 0x2b(%rdx,%r11,1),%r10d 0.32% ?? 0x00007f28e8bef905: mov %r10b,0x2b(%rcx,%r8,1) ; {other} 1.05% ?? 0x00007f28e8bef90a: movsbl 0x2c(%rdx,%r11,1),%r10d 0.27% ?? 0x00007f28e8bef910: mov %r10b,0x2c(%rcx,%r8,1) 1.18% ?? 0x00007f28e8bef915: movsbl 0x2d(%rdx,%r11,1),%r10d 0.24% ?? 0x00007f28e8bef91b: mov %r10b,0x2d(%rcx,%r8,1) 0.98% ?? 0x00007f28e8bef920: movsbl 0x2e(%rdx,%r11,1),%r10d 0.35% ?? 0x00007f28e8bef926: mov %r10b,0x2e(%rcx,%r8,1) 1.16% ?? 0x00007f28e8bef92b: movsbl 0x2f(%rdx,%r11,1),%r10d 0.38% ?? 0x00007f28e8bef931: mov %r10b,0x2f(%rcx,%r8,1) 1.14% ?? 0x00007f28e8bef936: movsbl 0x30(%rdx,%r11,1),%r10d 0.35% ?? 0x00007f28e8bef93c: mov %r10b,0x30(%rcx,%r8,1) 1.16% ?? 0x00007f28e8bef941: movsbl 0x31(%rdx,%r11,1),%r10d 0.32% ?? 0x00007f28e8bef947: mov %r10b,0x31(%rcx,%r8,1) 1.19% ?? 0x00007f28e8bef94c: movsbl 0x32(%rdx,%r11,1),%r10d 0.28% ?? 0x00007f28e8bef952: mov %r10b,0x32(%rcx,%r8,1) 0.98% ?? 0x00007f28e8bef957: movsbl 0x33(%rdx,%r11,1),%r10d 0.37% ?? 0x00007f28e8bef95d: mov %r10b,0x33(%rcx,%r8,1) 1.10% ?? 0x00007f28e8bef962: movsbl 0x34(%rdx,%r11,1),%r10d 0.30% ?? 0x00007f28e8bef968: mov %r10b,0x34(%rcx,%r8,1) 1.36% ?? 0x00007f28e8bef96d: movsbl 0x35(%rdx,%r11,1),%r10d 0.29% ?? 0x00007f28e8bef973: mov %r10b,0x35(%rcx,%r8,1) 1.11% ?? 0x00007f28e8bef978: movsbl 0x36(%rdx,%r11,1),%r10d 0.38% ?? 0x00007f28e8bef97e: mov %r10b,0x36(%rcx,%r8,1) 1.01% ?? 0x00007f28e8bef983: movsbl 0x37(%rdx,%r11,1),%r10d 0.45% ?? 0x00007f28e8bef989: mov %r10b,0x37(%rcx,%r8,1) 1.14% ?? 0x00007f28e8bef98e: movsbl 0x38(%rdx,%r11,1),%r10d 0.29% ?? 0x00007f28e8bef994: mov %r10b,0x38(%rcx,%r8,1) 1.20% ?? 0x00007f28e8bef999: movsbl 0x39(%rdx,%r11,1),%r10d 0.30% ?? 0x00007f28e8bef99f: mov %r10b,0x39(%rcx,%r8,1) 1.07% ?? 0x00007f28e8bef9a4: movsbl 0x3a(%rdx,%r11,1),%r10d 0.38% ?? 0x00007f28e8bef9aa: mov %r10b,0x3a(%rcx,%r8,1) 1.13% ?? 0x00007f28e8bef9af: movsbl 0x3b(%rdx,%r11,1),%r10d 0.26% ?? 0x00007f28e8bef9b5: mov %r10b,0x3b(%rcx,%r8,1) 1.01% ?? 0x00007f28e8bef9ba: movsbl 0x3c(%rdx,%r11,1),%r10d 0.34% ?? 0x00007f28e8bef9c0: mov %r10b,0x3c(%rcx,%r8,1) 1.42% ?? 0x00007f28e8bef9c5: movsbl 0x3d(%rdx,%r11,1),%r10d 0.35% ?? 0x00007f28e8bef9cb: mov %r10b,0x3d(%rcx,%r8,1) 1.09% ?? 0x00007f28e8bef9d0: movsbl 0x3e(%rdx,%r11,1),%r10d 0.26% ?? 0x00007f28e8bef9d6: mov %r10b,0x3e(%rcx,%r8,1) 1.25% ?? 0x00007f28e8bef9db: movsbl 0x3f(%rdx,%r11,1),%r10d 0.32% ?? 0x00007f28e8bef9e1: mov %r10b,0x3f(%rcx,%r8,1) 1.03% ?? 0x00007f28e8bef9e6: movsbl 0x40(%rdx,%r11,1),%r10d 0.35% ?? 0x00007f28e8bef9ec: mov %r10b,0x40(%rcx,%r8,1) 1.18% ?? 0x00007f28e8bef9f1: movsbl 0x41(%rdx,%rbx,1),%r10d 0.29% ?? 0x00007f28e8bef9f7: mov %r10b,0x41(%rcx,%r9,1) 1.18% ?? 0x00007f28e8bef9fc: movsbl 0x42(%rdx,%r11,1),%r10d 0.39% ?? 0x00007f28e8befa02: mov %r10b,0x42(%rcx,%r8,1) 1.15% ?? 0x00007f28e8befa07: movsbl 0x43(%rdx,%r11,1),%r10d ; {other} 0.26% ?? 0x00007f28e8befa0d: mov %r10b,0x43(%rcx,%r8,1) 1.09% ?? 0x00007f28e8befa12: movsbl 0x44(%rdx,%rbx,1),%r10d 0.32% ?? 0x00007f28e8befa18: mov %r10b,0x44(%rcx,%r9,1) 1.02% ?? 0x00007f28e8befa1d: movsbl 0x45(%rdx,%r11,1),%r10d 0.32% ?? 0x00007f28e8befa23: mov %r10b,0x45(%rcx,%r8,1) 1.15% ?? 0x00007f28e8befa28: movsbl 0x46(%rdx,%r11,1),%r10d 0.37% ?? 0x00007f28e8befa2e: mov %r10b,0x46(%rcx,%r8,1) 1.20% ?? 0x00007f28e8befa33: movsbl 0x47(%rdx,%rbx,1),%r10d 0.30% ?? 0x00007f28e8befa39: mov %r10b,0x47(%rcx,%r8,1) 1.01% ?? 0x00007f28e8befa3e: movsbl 0x48(%rdx,%r11,1),%r10d 0.35% ?? 0x00007f28e8befa44: mov %r10b,0x48(%rcx,%r9,1) 1.30% ?? 0x00007f28e8befa49: movsbl 0x49(%rdx,%r11,1),%r10d 0.44% ?? 0x00007f28e8befa4f: mov %r10b,0x49(%rcx,%r8,1) 1.18% ?? 0x00007f28e8befa54: movsbl 0x4a(%rdx,%r11,1),%r10d 0.31% ?? 0x00007f28e8befa5a: mov %r10b,0x4a(%rcx,%r8,1) 1.26% ?? 0x00007f28e8befa5f: movsbl 0x4b(%rdx,%r11,1),%r10d 0.28% ?? 0x00007f28e8befa65: mov %r10b,0x4b(%rcx,%r8,1) 1.01% ?? 0x00007f28e8befa6a: movsbl 0x4c(%rdx,%r11,1),%r10d 0.64% ?? 0x00007f28e8befa70: mov %r10b,0x4c(%rcx,%r8,1) 1.31% ?? 0x00007f28e8befa75: movsbl 0x4d(%rdx,%r11,1),%r10d 0.46% ?? 0x00007f28e8befa7b: mov %r10b,0x4d(%rcx,%r8,1) 1.22% ?? 0x00007f28e8befa80: movsbl 0x4e(%rdx,%r11,1),%r10d 0.44% ?? 0x00007f28e8befa86: mov %r10b,0x4e(%rcx,%r8,1) 1.22% ?? 0x00007f28e8befa8b: movsbl 0x4f(%rdx,%r11,1),%r10d 0.25% ?? 0x00007f28e8befa91: mov %r10b,0x4f(%rcx,%r8,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 22 (line 92) 1.09% ?? 0x00007f28e8befa96: add $0x40,%esi ;*iinc {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 23 (line 91) ?? 0x00007f28e8befa99: cmp %r14d,%esi ?? 0x00007f28e8befa9c: jl 0x00007f28e8bef7b0 ;*goto {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91) ? 0x00007f28e8befaa2: mov 0x30(%r15),%r10 ; ImmutableOopMap {rcx=Oop rdx=Oop } ? ;*goto {reexecute=1 rethrow=0 return_oop=0} ? ; - (reexecute) org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91) 0.12% ? 0x00007f28e8befaa6: test %eax,(%r10) ;*goto {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91) ? ; {poll} 0.04% ? 0x00007f28e8befaa9: cmp %r13d,%esi ? 0x00007f28e8befaac: jge 0x00007f28e8befae6 ? 0x00007f28e8befaae: vmovd %xmm0,%r8d ? 0x00007f28e8befab3: vmovd %xmm2,%r9d ? 0x00007f28e8befab8: mov %r13d,%r14d 0x00007f28e8befabb: sub %esi,%r14d 0x00007f28e8befabe: xor %r10d,%r10d 0x00007f28e8befac1: cmp %esi,%r13d 0x00007f28e8befac4: cmovl %r10d,%r14d .................................................................................................... 95.97% ....[Hottest Regions]............................................................................... 95.97% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 3, compile id 1261 0.65% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 3, compile id 1261 0.27% libjvm.so ElfSymbolTable::lookup(unsigned char*, int*, int*, int*, ElfFuncDescTable*) 0.14% libjvm.so resolve_inlining_predicate(CompileCommandEnum, methodHandle const&) 0.12% libc.so.6 __futex_abstimed_wait_common 0.12% libc.so.6 clone3 0.11% c2, level 4 org.openjdk.bench.vm.compiler.jmh_generated.VectorAliasing_VectorAliasingSuperWordPretendNotProfitable_bench_copy_array_B_differentIndex_alias_jmhTest::bench_copy_array_B_differentIndex_alias_avgt_jmhStub, version 5, compile id 1281 0.10% kernel [unknown] 0.08% libc.so.6 __GI___lll_lock_wait 0.07% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 3, compile id 1261 0.07% libjvm.so CompilerOracle::should_not_inline(methodHandle const&) 0.07% libjvm.so RelocIterator::initialize(nmethod*, unsigned char*, unsigned char*) 0.07% libjvm.so xmlStream::write_text(char const*, unsigned long) [clone .part.0] 0.06% libjvm.so CompilerOracle::should_exclude(methodHandle const&) 0.06% libjvm.so CompilerOracle::tag_blackhole_if_possible(methodHandle const&) 0.06% libjvm.so defaultStream::write(char const*, unsigned long) 0.06% libc.so.6 _IO_fwrite 0.05% c2, level 4 org.openjdk.bench.vm.compiler.jmh_generated.VectorAliasing_VectorAliasingSuperWordPretendNotProfitable_bench_copy_array_B_differentIndex_alias_jmhTest::bench_copy_array_B_differentIndex_alias_avgt_jmhStub, version 5, compile id 1281 0.05% libjvm.so MethodMatcher::matches(methodHandle const&) const 0.05% libjvm.so os::pd_write(int, void const*, unsigned long) 1.80% <...other 139 warm regions...> .................................................................................................... 99.99% ....[Hottest Methods (after inlining)].............................................................. 96.70% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 3, compile id 1261 0.27% libjvm.so ElfSymbolTable::lookup(unsigned char*, int*, int*, int*, ElfFuncDescTable*) 0.17% libjvm.so resolve_inlining_predicate(CompileCommandEnum, methodHandle const&) 0.16% c2, level 4 org.openjdk.bench.vm.compiler.jmh_generated.VectorAliasing_VectorAliasingSuperWordPretendNotProfitable_bench_copy_array_B_differentIndex_alias_jmhTest::bench_copy_array_B_differentIndex_alias_avgt_jmhStub, version 5, compile id 1281 0.15% libjvm.so defaultStream::write(char const*, unsigned long) 0.13% 0.13% hsdis-amd64.so print_insn 0.12% libc.so.6 clone3 0.12% libc.so.6 __futex_abstimed_wait_common 0.10% kernel [unknown] 0.09% libjvm.so xmlStream::write_text(char const*, unsigned long) [clone .part.0] 0.09% libc.so.6 _IO_fwrite 0.08% libc.so.6 __GI___lll_lock_wait 0.07% libjvm.so CompilerOracle::should_not_inline(methodHandle const&) 0.07% libjvm.so RelocIterator::initialize(nmethod*, unsigned char*, unsigned char*) 0.06% libc.so.6 __vfprintf_internal 0.06% libjvm.so CompilerOracle::should_exclude(methodHandle const&) 0.06% libjvm.so CompilerOracle::tag_blackhole_if_possible(methodHandle const&) 0.05% libc.so.6 __GI___pthread_disable_asynccancel 0.05% libjvm.so os::pd_write(int, void const*, unsigned long) 1.31% <...other 94 warm methods...> .................................................................................................... 99.99% ....[Distribution by Source]........................................................................ 96.86% c2, level 4 1.72% libjvm.so 0.88% libc.so.6 0.18% hsdis-amd64.so 0.13% 0.10% kernel 0.10% interpreter 0.01% perf-1337464.map 0.01% ld-linux-x86-64.so.2 .................................................................................................... 99.99% # Run complete. Total time: 00:00:21 REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell. NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise extra caution when trusting the results, look into the generated code to check the benchmark still works, and factor in a small probability of new VM bugs. Additionally, while comparisons between different JVMs are already problematic, the performance difference caused by different Blackhole modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons. Benchmark (SIZE) (seed) Mode Cnt Score Error Units VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias 10000 0 avgt 10 2854.720 ? 20.377 ns/op VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias:asm 10000 0 avgt NaN --- Performance counter stats for './build/linux-x64/jdk/bin/java -Djava.library.path=/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/native -jar /home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/benchmarks.jar VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias -prof perfasm': 38,626.40 msec task-clock:u # 1.671 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 65,014 page-faults:u # 1.683 K/sec 61,866,869,316 cycles:u # 1.602 GHz 139,866,728,493 instructions:u # 2.26 insn per cycle 12,248,937,904 branches:u # 317.113 M/sec 261,300,604 branch-misses:u # 2.13% of all branches TopdownL1 # 18.9 % tma_backend_bound # 33.9 % tma_bad_speculation # 12.4 % tma_frontend_bound # 34.9 % tma_retiring 23.119850997 seconds time elapsed 25.491034000 seconds user 13.058817000 seconds sys VS [empeter at emanuel jdk-fork6]$ perf stat ./build/linux-x64/jdk/bin/java -Djava.library.path=/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/native -jar /home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/benchmarks.jar "VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias" -prof perfasm WARNING: A terminally deprecated method in sun.misc.Unsafe has been called WARNING: sun.misc.Unsafe::objectFieldOffset has been called by org.openjdk.jmh.util.Utils (file:/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/benchmarks.jar) WARNING: Please consider reporting this to the maintainers of class org.openjdk.jmh.util.Utils WARNING: sun.misc.Unsafe::objectFieldOffset will be removed in a future release # JMH version: 1.37 # VM version: JDK 26-internal, Java HotSpot(TM) 64-Bit Server VM, 26-internal-2025-08-19-0806546.empeter... # VM invoker: /home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/jdk/bin/java # VM options: -XX:+UseSuperWord # Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable) # Warmup: 1 iterations, 1 s each # Measurement: 10 iterations, 1 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: org.openjdk.bench.vm.compiler.VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias # Parameters: (SIZE = 10000, seed = 0) # Run progress: 0.00% complete, ETA 00:00:11 # Fork: 1 of 1 # Preparing profilers: LinuxPerfAsmProfiler # Profilers consume stdout and stderr from target VM, use -v EXTRA to copy to console # Warmup Iteration 1: 3546.830 ns/op Iteration 1: 3178.654 ns/op Iteration 2: 3191.249 ns/op Iteration 3: 3184.110 ns/op Iteration 4: 3199.210 ns/op Iteration 5: 3188.098 ns/op Iteration 6: 3190.187 ns/op Iteration 7: 3177.316 ns/op Iteration 8: 3166.970 ns/op Iteration 9: 3175.117 ns/op Iteration 10: 3165.729 ns/op # Processing profiler results: LinuxPerfAsmProfiler Result "org.openjdk.bench.vm.compiler.VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias": 3181.664 ?(99.9%) 16.411 ns/op [Average] (min, avg, max) = (3165.729, 3181.664, 3199.210), stdev = 10.855 CI (99.9%): [3165.253, 3198.075] (assumes normal distribution) Secondary result "org.openjdk.bench.vm.compiler.VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias:asm": PrintAssembly processed: 327081 total address lines. Perf output processed (skipped 10.149 seconds): Column 1: cycles (10319 events) Hottest code regions (>10.00% "cycles" events): Event counts are percents of total event count. ....[Hottest Region 1].............................................................................. c2, level 4, org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 5, compile id 1267 0x00007fc4e0bef97f: movsbl 0x10(%rdx,%r13,1),%r11d 0x00007fc4e0bef985: mov %r11b,0x10(%rcx,%rbp,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 22 (line 92) 0x00007fc4e0bef98a: mov $0x1,%r10d 0x00007fc4e0bef990: mov 0x8(%rsp),%r8d 0x00007fc4e0bef995: cmp $0x1,%r8d ? 0x00007fc4e0bef999: jle 0x00007fc4e0befccd ? 0x00007fc4e0bef99f: mov $0xfa00,%esi ?? 0x00007fc4e0bef9a4: jmp 0x00007fc4e0bef9a9 ?? ? 0x00007fc4e0bef9a6: mov %r14d,%r8d 0.02% ?? ? 0x00007fc4e0bef9a9: mov %r8d,%r11d ? ? 0x00007fc4e0bef9ac: sub %r10d,%r11d ? ? 0x00007fc4e0bef9af: xor %r9d,%r9d ? ? 0x00007fc4e0bef9b2: cmp %r10d,%r8d ? ? 0x00007fc4e0bef9b5: cmovl %r9d,%r11d ? ? 0x00007fc4e0bef9b9: cmp $0xfa00,%r11d ? ? 0x00007fc4e0bef9c0: cmova %esi,%r11d 0.01% ? ? 0x00007fc4e0bef9c4: add %r10d,%r11d ? ? 0x00007fc4e0bef9c7: mov %r8d,%r14d ? ? 0x00007fc4e0bef9ca: nopw 0x0(%rax,%rax,1) ;*aload_2 {reexecute=0 rethrow=0 return_oop=0} ? ? ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 10 (line 92) 0.22% ? ?? 0x00007fc4e0bef9d0: vmovd %xmm0,%ebx ? ?? 0x00007fc4e0bef9d4: add %r10d,%ebx 0.12% ? ?? 0x00007fc4e0bef9d7: mov 0x4(%rsp),%r9d 0.47% ? ?? 0x00007fc4e0bef9dc: add %r10d,%r9d 1.60% ? ?? 0x00007fc4e0bef9df: movslq %ebx,%r8 0.02% ? ?? 0x00007fc4e0bef9e2: movslq %r9d,%rbx 0.27% ? ?? 0x00007fc4e0bef9e5: movslq %r10d,%r9 ? ?? 0x00007fc4e0bef9e8: lea (%r9,%rbp,1),%rdi 0.25% ? ?? 0x00007fc4e0bef9ec: lea (%r9,%r13,1),%rax 0.01% ? ?? 0x00007fc4e0bef9f0: movsbl 0x10(%rdx,%rax,1),%r9d 0.41% ? ?? 0x00007fc4e0bef9f6: mov %r9b,0x10(%rcx,%rdi,1) 0.16% ? ?? 0x00007fc4e0bef9fb: movsbl 0x11(%rdx,%rax,1),%r9d 1.86% ? ?? 0x00007fc4e0befa01: mov %r9b,0x11(%rcx,%rdi,1) 0.78% ? ?? 0x00007fc4e0befa06: movsbl 0x12(%rdx,%rax,1),%r9d 0.36% ? ?? 0x00007fc4e0befa0c: mov %r9b,0x12(%rcx,%rdi,1) 0.83% ? ?? 0x00007fc4e0befa11: movsbl 0x13(%rdx,%r8,1),%r9d 1.06% ? ?? 0x00007fc4e0befa17: mov %r9b,0x13(%rcx,%rbx,1) 4.04% ? ?? 0x00007fc4e0befa1c: movsbl 0x14(%rdx,%r8,1),%r9d 4.36% ? ?? 0x00007fc4e0befa22: mov %r9b,0x14(%rcx,%rbx,1) 3.32% ? ?? 0x00007fc4e0befa27: movsbl 0x15(%rdx,%r8,1),%r9d 1.06% ? ?? 0x00007fc4e0befa2d: mov %r9b,0x15(%rcx,%rbx,1) 1.21% ? ?? 0x00007fc4e0befa32: movsbl 0x16(%rdx,%r8,1),%r9d 0.64% ? ?? 0x00007fc4e0befa38: mov %r9b,0x16(%rcx,%rbx,1) 1.11% ? ?? 0x00007fc4e0befa3d: movsbl 0x17(%rdx,%r8,1),%r9d 0.07% ? ?? 0x00007fc4e0befa43: mov %r9b,0x17(%rcx,%rbx,1) 0.70% ? ?? 0x00007fc4e0befa48: movsbl 0x18(%rdx,%r8,1),%r9d 1.01% ? ?? 0x00007fc4e0befa4e: mov %r9b,0x18(%rcx,%rbx,1) 1.61% ? ?? 0x00007fc4e0befa53: movsbl 0x19(%rdx,%r8,1),%r9d 0.58% ? ?? 0x00007fc4e0befa59: mov %r9b,0x19(%rcx,%rbx,1) 1.18% ? ?? 0x00007fc4e0befa5e: movsbl 0x1a(%rdx,%r8,1),%r9d 0.34% ? ?? 0x00007fc4e0befa64: mov %r9b,0x1a(%rcx,%rbx,1) 1.08% ? ?? 0x00007fc4e0befa69: movsbl 0x1b(%rdx,%r8,1),%r9d ; {other} 0.05% ? ?? 0x00007fc4e0befa6f: mov %r9b,0x1b(%rcx,%rbx,1) 0.79% ? ?? 0x00007fc4e0befa74: movsbl 0x1c(%rdx,%r8,1),%r9d 0.70% ? ?? 0x00007fc4e0befa7a: mov %r9b,0x1c(%rcx,%rbx,1) 1.49% ? ?? 0x00007fc4e0befa7f: movsbl 0x1d(%rdx,%r8,1),%r9d 0.33% ? ?? 0x00007fc4e0befa85: mov %r9b,0x1d(%rcx,%rbx,1) 0.80% ? ?? 0x00007fc4e0befa8a: movsbl 0x1e(%rdx,%r8,1),%r9d 0.49% ? ?? 0x00007fc4e0befa90: mov %r9b,0x1e(%rcx,%rbx,1) 1.03% ? ?? 0x00007fc4e0befa95: movsbl 0x1f(%rdx,%r8,1),%r9d 0.05% ? ?? 0x00007fc4e0befa9b: mov %r9b,0x1f(%rcx,%rbx,1) 0.98% ? ?? 0x00007fc4e0befaa0: movsbl 0x20(%rdx,%r8,1),%r9d 0.33% ? ?? 0x00007fc4e0befaa6: mov %r9b,0x20(%rcx,%rbx,1) 1.31% ? ?? 0x00007fc4e0befaab: movsbl 0x21(%rdx,%r8,1),%r9d 0.02% ? ?? 0x00007fc4e0befab1: mov %r9b,0x21(%rcx,%rbx,1) 0.84% ? ?? 0x00007fc4e0befab6: movsbl 0x22(%rdx,%r8,1),%r9d 0.05% ? ?? 0x00007fc4e0befabc: mov %r9b,0x22(%rcx,%rbx,1) 1.01% ? ?? 0x00007fc4e0befac1: movsbl 0x23(%rdx,%r8,1),%r9d ? ?? 0x00007fc4e0befac7: mov %r9b,0x23(%rcx,%rbx,1) 0.76% ? ?? 0x00007fc4e0befacc: movsbl 0x24(%rdx,%r8,1),%r9d 0.08% ? ?? 0x00007fc4e0befad2: mov %r9b,0x24(%rcx,%rbx,1) 1.17% ? ?? 0x00007fc4e0befad7: movsbl 0x25(%rdx,%r8,1),%r9d 0.01% ? ?? 0x00007fc4e0befadd: mov %r9b,0x25(%rcx,%rbx,1) 0.88% ? ?? 0x00007fc4e0befae2: movsbl 0x26(%rdx,%r8,1),%r9d 0.07% ? ?? 0x00007fc4e0befae8: mov %r9b,0x26(%rcx,%rbx,1) 1.23% ? ?? 0x00007fc4e0befaed: movsbl 0x27(%rdx,%r8,1),%r9d 0.02% ? ?? 0x00007fc4e0befaf3: mov %r9b,0x27(%rcx,%rbx,1) 0.73% ? ?? 0x00007fc4e0befaf8: movsbl 0x28(%rdx,%r8,1),%r9d 0.16% ? ?? 0x00007fc4e0befafe: mov %r9b,0x28(%rcx,%rbx,1) 1.27% ? ?? 0x00007fc4e0befb03: movsbl 0x29(%rdx,%r8,1),%r9d 0.01% ? ?? 0x00007fc4e0befb09: mov %r9b,0x29(%rcx,%rbx,1) 0.75% ? ?? 0x00007fc4e0befb0e: movsbl 0x2a(%rdx,%r8,1),%r9d 0.13% ? ?? 0x00007fc4e0befb14: mov %r9b,0x2a(%rcx,%rbx,1) 1.27% ? ?? 0x00007fc4e0befb19: movsbl 0x2b(%rdx,%r8,1),%r9d 0.01% ? ?? 0x00007fc4e0befb1f: mov %r9b,0x2b(%rcx,%rbx,1) 0.63% ? ?? 0x00007fc4e0befb24: movsbl 0x2c(%rdx,%r8,1),%r9d 0.17% ? ?? 0x00007fc4e0befb2a: mov %r9b,0x2c(%rcx,%rbx,1) 1.26% ? ?? 0x00007fc4e0befb2f: movsbl 0x2d(%rdx,%r8,1),%r9d 0.04% ? ?? 0x00007fc4e0befb35: mov %r9b,0x2d(%rcx,%rbx,1) 0.76% ? ?? 0x00007fc4e0befb3a: movsbl 0x2e(%rdx,%r8,1),%r9d 0.23% ? ?? 0x00007fc4e0befb40: mov %r9b,0x2e(%rcx,%rbx,1) 1.49% ? ?? 0x00007fc4e0befb45: movsbl 0x2f(%rdx,%r8,1),%r9d 0.14% ? ?? 0x00007fc4e0befb4b: mov %r9b,0x2f(%rcx,%rbx,1) 0.79% ? ?? 0x00007fc4e0befb50: movsbl 0x30(%rdx,%r8,1),%r9d 0.33% ? ?? 0x00007fc4e0befb56: mov %r9b,0x30(%rcx,%rbx,1) 1.44% ? ?? 0x00007fc4e0befb5b: movsbl 0x31(%rdx,%r8,1),%r9d 0.20% ? ?? 0x00007fc4e0befb61: mov %r9b,0x31(%rcx,%rbx,1) 0.78% ? ?? 0x00007fc4e0befb66: movsbl 0x32(%rdx,%r8,1),%r9d 0.46% ? ?? 0x00007fc4e0befb6c: mov %r9b,0x32(%rcx,%rbx,1) ; {other} 1.46% ? ?? 0x00007fc4e0befb71: movsbl 0x33(%rdx,%r8,1),%r9d ? ?? 0x00007fc4e0befb77: mov %r9b,0x33(%rcx,%rbx,1) 0.66% ? ?? 0x00007fc4e0befb7c: movsbl 0x34(%rdx,%r8,1),%r9d 0.07% ? ?? 0x00007fc4e0befb82: mov %r9b,0x34(%rcx,%rbx,1) 1.55% ? ?? 0x00007fc4e0befb87: movsbl 0x35(%rdx,%r8,1),%r9d 0.01% ? ?? 0x00007fc4e0befb8d: mov %r9b,0x35(%rcx,%rbx,1) 0.78% ? ?? 0x00007fc4e0befb92: movsbl 0x36(%rdx,%r8,1),%r9d 0.19% ? ?? 0x00007fc4e0befb98: mov %r9b,0x36(%rcx,%rbx,1) 1.47% ? ?? 0x00007fc4e0befb9d: movsbl 0x37(%rdx,%r8,1),%r9d 0.01% ? ?? 0x00007fc4e0befba3: mov %r9b,0x37(%rcx,%rbx,1) 0.74% ? ?? 0x00007fc4e0befba8: movsbl 0x38(%rdx,%r8,1),%r9d 0.15% ? ?? 0x00007fc4e0befbae: mov %r9b,0x38(%rcx,%rbx,1) 1.24% ? ?? 0x00007fc4e0befbb3: movsbl 0x39(%rdx,%r8,1),%r9d 0.01% ? ?? 0x00007fc4e0befbb9: mov %r9b,0x39(%rcx,%rbx,1) 0.68% ? ?? 0x00007fc4e0befbbe: movsbl 0x3a(%rdx,%r8,1),%r9d 0.25% ? ?? 0x00007fc4e0befbc4: mov %r9b,0x3a(%rcx,%rbx,1) 1.59% ? ?? 0x00007fc4e0befbc9: movsbl 0x3b(%rdx,%r8,1),%r9d ? ?? 0x00007fc4e0befbcf: mov %r9b,0x3b(%rcx,%rbx,1) 0.57% ? ?? 0x00007fc4e0befbd4: movsbl 0x3c(%rdx,%r8,1),%r9d 0.12% ? ?? 0x00007fc4e0befbda: mov %r9b,0x3c(%rcx,%rbx,1) 1.55% ? ?? 0x00007fc4e0befbdf: movsbl 0x3d(%rdx,%r8,1),%r9d 0.01% ? ?? 0x00007fc4e0befbe5: mov %r9b,0x3d(%rcx,%rbx,1) 0.57% ? ?? 0x00007fc4e0befbea: movsbl 0x3e(%rdx,%r8,1),%r9d 0.12% ? ?? 0x00007fc4e0befbf0: mov %r9b,0x3e(%rcx,%rbx,1) 1.48% ? ?? 0x00007fc4e0befbf5: movsbl 0x3f(%rdx,%r8,1),%r9d ? ?? 0x00007fc4e0befbfb: mov %r9b,0x3f(%rcx,%rbx,1) 0.41% ? ?? 0x00007fc4e0befc00: movsbl 0x40(%rdx,%r8,1),%r9d 0.14% ? ?? 0x00007fc4e0befc06: mov %r9b,0x40(%rcx,%rdi,1) 1.77% ? ?? 0x00007fc4e0befc0b: movsbl 0x41(%rdx,%r8,1),%r9d 0.01% ? ?? 0x00007fc4e0befc11: mov %r9b,0x41(%rcx,%rbx,1) 0.30% ? ?? 0x00007fc4e0befc16: movsbl 0x42(%rdx,%r8,1),%r9d 0.33% ? ?? 0x00007fc4e0befc1c: mov %r9b,0x42(%rcx,%rbx,1) 2.00% ? ?? 0x00007fc4e0befc21: movsbl 0x43(%rdx,%r8,1),%r9d 0.01% ? ?? 0x00007fc4e0befc27: mov %r9b,0x43(%rcx,%rbx,1) 0.34% ? ?? 0x00007fc4e0befc2c: movsbl 0x44(%rdx,%r8,1),%r9d 0.87% ? ?? 0x00007fc4e0befc32: mov %r9b,0x44(%rcx,%rbx,1) 2.25% ? ?? 0x00007fc4e0befc37: movsbl 0x45(%rdx,%r8,1),%r9d 0.01% ? ?? 0x00007fc4e0befc3d: mov %r9b,0x45(%rcx,%rbx,1) 0.19% ? ?? 0x00007fc4e0befc42: movsbl 0x46(%rdx,%r8,1),%r9d 0.65% ? ?? 0x00007fc4e0befc48: mov %r9b,0x46(%rcx,%rbx,1) 1.96% ? ?? 0x00007fc4e0befc4d: movsbl 0x47(%rdx,%r8,1),%r9d 0.02% ? ?? 0x00007fc4e0befc53: mov %r9b,0x47(%rcx,%rbx,1) 0.16% ? ?? 0x00007fc4e0befc58: movsbl 0x48(%rdx,%r8,1),%r9d 0.58% ? ?? 0x00007fc4e0befc5e: mov %r9b,0x48(%rcx,%rbx,1) 1.78% ? ?? 0x00007fc4e0befc63: movsbl 0x49(%rdx,%r8,1),%r9d 0.06% ? ?? 0x00007fc4e0befc69: mov %r9b,0x49(%rcx,%rbx,1) ; {other} 0.18% ? ?? 0x00007fc4e0befc6e: movsbl 0x4a(%rdx,%r8,1),%r9d 0.49% ? ?? 0x00007fc4e0befc74: mov %r9b,0x4a(%rcx,%rbx,1) 1.76% ? ?? 0x00007fc4e0befc79: movsbl 0x4b(%rdx,%r8,1),%r9d 0.01% ? ?? 0x00007fc4e0befc7f: mov %r9b,0x4b(%rcx,%rbx,1) 0.15% ? ?? 0x00007fc4e0befc84: movsbl 0x4c(%rdx,%r8,1),%r9d 0.49% ? ?? 0x00007fc4e0befc8a: mov %r9b,0x4c(%rcx,%rbx,1) 1.92% ? ?? 0x00007fc4e0befc8f: movsbl 0x4d(%rdx,%r8,1),%r9d 0.21% ? ?? 0x00007fc4e0befc95: mov %r9b,0x4d(%rcx,%rbx,1) 0.22% ? ?? 0x00007fc4e0befc9a: movsbl 0x4e(%rdx,%r8,1),%r9d 0.50% ? ?? 0x00007fc4e0befca0: mov %r9b,0x4e(%rcx,%rbx,1) 1.94% ? ?? 0x00007fc4e0befca5: movsbl 0x4f(%rdx,%r8,1),%r9d 0.12% ? ?? 0x00007fc4e0befcab: mov %r9b,0x4f(%rcx,%rbx,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} ? ?? ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 22 (line 92) 0.18% ? ?? 0x00007fc4e0befcb0: add $0x40,%r10d ;*iinc {reexecute=0 rethrow=0 return_oop=0} ? ?? ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 23 (line 91) ? ?? 0x00007fc4e0befcb4: cmp %r11d,%r10d ? ?? 0x00007fc4e0befcb7: jl 0x00007fc4e0bef9d0 ;*goto {reexecute=0 rethrow=0 return_oop=0} ? ? ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91) 0.01% ? ? 0x00007fc4e0befcbd: mov 0x30(%r15),%r11 ; ImmutableOopMap {rcx=Oop rdx=Oop } ? ? ;*goto {reexecute=1 rethrow=0 return_oop=0} ? ? ; - (reexecute) org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91) 0.11% ? ? 0x00007fc4e0befcc1: test %eax,(%r11) ;*goto {reexecute=0 rethrow=0 return_oop=0} ? ? ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91) ? ? ; {poll} 0.02% ? ? 0x00007fc4e0befcc4: cmp %r14d,%r10d ? ? 0x00007fc4e0befcc7: jl 0x00007fc4e0bef9a6 0.01% ? 0x00007fc4e0befccd: cmp (%rsp),%r10d 0x00007fc4e0befcd1: jge 0x00007fc4e0bef967 0.02% 0x00007fc4e0befcd7: nop ;*aload_2 {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 10 (line 92) 0.06% ? 0x00007fc4e0befcd8: movslq %r10d,%r11 0.01% ? 0x00007fc4e0befcdb: lea (%r11,%rbp,1),%r8 0.18% ? 0x00007fc4e0befcdf: lea (%r11,%r13,1),%r9 0.01% ? 0x00007fc4e0befce3: movsbl 0x10(%rdx,%r9,1),%r9d 0.04% ? 0x00007fc4e0befce9: mov %r9b,0x10(%rcx,%r8,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 22 (line 92) 0.21% ? 0x00007fc4e0befcee: inc %r10d ;*iinc {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 23 (line 91) ? 0x00007fc4e0befcf1: cmp (%rsp),%r10d ? 0x00007fc4e0befcf5: jl 0x00007fc4e0befcd8 0x00007fc4e0befcf7: jmpq 0x00007fc4e0bef967 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 7 (line 91) 0x00007fc4e0befcfc: mov $0xffffff6e,%esi 0x00007fc4e0befd01: mov %rdx,%rbp 0x00007fc4e0befd04: mov %rcx,(%rsp) 0x00007fc4e0befd08: mov %r8d,0x8(%rsp) .................................................................................................... 96.25% ....[Hottest Regions]............................................................................... 96.25% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 5, compile id 1267 0.29% libjvm.so ElfSymbolTable::lookup(unsigned char*, int*, int*, int*, ElfFuncDescTable*) 0.16% libjvm.so resolve_inlining_predicate(CompileCommandEnum, methodHandle const&) 0.14% libjvm.so CompilerOracle::tag_blackhole_if_possible(methodHandle const&) 0.12% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 5, compile id 1267 0.12% libc.so.6 __futex_abstimed_wait_common 0.11% libjvm.so resolve_inlining_predicate(CompileCommandEnum, methodHandle const&) 0.11% libjvm.so MethodMatcher::matches(methodHandle const&) const 0.09% libjvm.so CompilerOracle::should_not_inline(methodHandle const&) 0.09% libjvm.so fileStream::write(char const*, unsigned long) 0.08% libjvm.so defaultStream::write(char const*, unsigned long) 0.08% libc.so.6 _IO_fwrite 0.08% libc.so.6 clone3 0.07% kernel [unknown] 0.07% c2, level 4 org.openjdk.bench.vm.compiler.jmh_generated.VectorAliasing_VectorAliasingSuperWord_bench_copy_array_B_differentIndex_alias_jmhTest::bench_copy_array_B_differentIndex_alias_avgt_jmhStub, version 5, compile id 1288 0.07% libjvm.so CompilerOracle::should_exclude(methodHandle const&) 0.07% libjvm.so RelocIterator::initialize(nmethod*, unsigned char*, unsigned char*) 0.07% libc.so.6 __GI___pthread_disable_asynccancel 0.06% libjvm.so os::write(int, void const*, unsigned long) 0.05% libjvm.so xmlStream::write_text(char const*, unsigned long) [clone .part.0] 1.85% <...other 145 warm regions...> .................................................................................................... 99.99% ....[Hottest Methods (after inlining)].............................................................. 96.41% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 5, compile id 1267 0.29% libjvm.so ElfSymbolTable::lookup(unsigned char*, int*, int*, int*, ElfFuncDescTable*) 0.27% libjvm.so resolve_inlining_predicate(CompileCommandEnum, methodHandle const&) 0.15% hsdis-amd64.so print_insn 0.14% libjvm.so CompilerOracle::tag_blackhole_if_possible(methodHandle const&) 0.13% 0.13% libc.so.6 _IO_fwrite 0.12% libc.so.6 __futex_abstimed_wait_common 0.12% libjvm.so fileStream::write(char const*, unsigned long) 0.11% libjvm.so MethodMatcher::matches(methodHandle const&) const 0.11% libjvm.so defaultStream::write(char const*, unsigned long) 0.09% libjvm.so CompilerOracle::should_not_inline(methodHandle const&) 0.09% c2, level 4 org.openjdk.bench.vm.compiler.jmh_generated.VectorAliasing_VectorAliasingSuperWord_bench_copy_array_B_differentIndex_alias_jmhTest::bench_copy_array_B_differentIndex_alias_avgt_jmhStub, version 5, compile id 1288 0.09% libjvm.so xmlStream::write_text(char const*, unsigned long) [clone .part.0] 0.08% libc.so.6 clone3 0.07% libc.so.6 __GI___pthread_disable_asynccancel 0.07% libjvm.so RelocIterator::initialize(nmethod*, unsigned char*, unsigned char*) 0.07% libjvm.so CompilerOracle::should_exclude(methodHandle const&) 0.07% kernel [unknown] 0.07% interpreter method entry point (kind = zerolocals) 1.36% <...other 99 warm methods...> .................................................................................................... 99.99% ....[Distribution by Source]........................................................................ 96.54% c2, level 4 2.24% libjvm.so 0.69% libc.so.6 0.19% hsdis-amd64.so 0.13% 0.13% interpreter 0.07% kernel 0.01% ld-linux-x86-64.so.2 .................................................................................................... 99.99% # Run complete. Total time: 00:00:21 REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell. NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise extra caution when trusting the results, look into the generated code to check the benchmark still works, and factor in a small probability of new VM bugs. Additionally, while comparisons between different JVMs are already problematic, the performance difference caused by different Blackhole modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons. Benchmark (SIZE) (seed) Mode Cnt Score Error Units VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias 10000 0 avgt 10 3181.664 ? 16.411 ns/op VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias:asm 10000 0 avgt NaN --- Performance counter stats for './build/linux-x64/jdk/bin/java -Djava.library.path=/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/native -jar /home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/benchmarks.jar VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias -prof perfasm': 38,374.64 msec task-clock:u # 1.688 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 63,886 page-faults:u # 1.665 K/sec 61,212,825,552 cycles:u # 1.595 GHz 130,428,038,623 instructions:u # 2.13 insn per cycle 12,158,904,836 branches:u # 316.847 M/sec 259,878,216 branch-misses:u # 2.14% of all branches TopdownL1 # 21.9 % tma_backend_bound # 32.5 % tma_bad_speculation # 12.8 % tma_frontend_bound # 32.8 % tma_retiring 22.730051773 seconds time elapsed 25.260343000 seconds user 13.046024000 seconds sys ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3213283462 From epeter at openjdk.org Fri Aug 22 07:41:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 07:41:05 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v18] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Wed, 20 Aug 2025 12:31:11 GMT, Emanuel Peter wrote: >> TODO work that arose during review process / recent merges with master: >> >> - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. >> - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. >> >> --------------- >> >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > disable flag if not possible Here the comparisons on different platforms. my avx512 laptop: image linux-x64: image macosx-x64: image linux-aarch64: image macosx-aarch64: image Strange is that the aliasing cases on `patch` and `no_predicate` can but do not have to have regressions. For example compare macosx-64x (regression with long only) and macosx-aarch64 (regression with byte and int only). But there are some kinds of regressions across all platforms. **Still**: the regression is in the 10-30% range for the edge case of aliasing. All other cases (no aliasing) have massive speedups. So over-all this is still a massive win. And yet: I would like to at least understand what the issue is here. I have no explanation at all right now. What I have tried so far: - Looked at assembly. Looks extremely similar, at least the main-loop does look basically identical. I checked with `perfasm` attached to the JMH benchmark, see results [here](https://github.com/openjdk/jdk/pull/24278#issuecomment-3201092650) and [here](https://github.com/openjdk/jdk/pull/24278#issuecomment-3213283462). - Artificially avoid vectorization of fast-loop, just to check if there may be an issue with `vzeroupper` / AVX->SSE transition. No effect. - Played with assembly level loop-alignment (address of instructions, OptoLoopAlignment). No effect. - Might it be the runtime check and related branch misprediction? But I can increase the iterations in the main-loop, and it has no effect on the performance difference. We only check the runtime check once per loop, so it should fade away as the loop size increases. But it does not fade away. - Run `perf stat`: it tells me that I have some issue with `backend_bound` and `bad_speculation`, see [here](https://github.com/openjdk/jdk/pull/24278#issuecomment-3201092650). But I cannot really find out more details on my machine. I'm also not sure if the reporting is correct here. - It is also not noise in the benchmark: all other results are quite sharp, and behave as expected. To summarize what I'm comparing here: - `not_profitable` (like before this PR): does not vectorize. All we get is a scalar loop for all cases. - `patch` and `no_predicate`: for aliasing cases, we eventually compile with multiversioning. Here, we get a fast-path (vectorized loop) and a slow-path (scalar loop). A runtime check determines which branch we take. With the aliasing case, we always take the slow-path. That performance we would expect to be identical to `not_profitable`. But we see that is not always the case. @vnkozlov Any other ideas what I could look into here? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3213393035 From epeter at openjdk.org Fri Aug 22 08:03:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 08:03:51 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 06:31:08 GMT, Quan Anh Mai wrote: >> I think this is fair, I've pushed a commit that changes the comment wording. Let me know what you think! > > We have `CastVV` which should be the packed version of `CastII`. The thing that is I think the most difficult is to properly wire it. In the simplest situation of all elements in the pack having the same control input can be easily handled, though. Right. I don't think there is currently any case where `CastII` would actually be useful to pack in superword. They usually originate from some control-flow. For example a comparison on a variable `if (x < 5)` and then we can `CastII` the variable `x` on the branches. But that only becomes relevant with if-conversion. Or do you see a case that could happen without if-conversion where `CastII` needs to be handled? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2293020838 From jsjolen at openjdk.org Fri Aug 22 08:56:55 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 22 Aug 2025 08:56:55 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers [v2] In-Reply-To: References: <1ZGeH-R9goJByTfkQSiSKp1nD9oxNqOkeG50T5rnJuI=.4cb38ce6-eac2-42fc-ad4d-771758bd4d84@github.com> Message-ID: On Wed, 20 Aug 2025 20:04:13 GMT, Dean Long wrote: >> Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: >> >> - Good catch by Vladimir >> - Vladimir's comments > > src/hotspot/share/code/relocInfo.hpp line 606: > >> 604: RelocIterator(CodeSection* cb, address begin = nullptr, address limit = nullptr); >> 605: RelocIterator(CodeBlob* cb); >> 606: RelocIterator(relocInfo& ri); > > How about making this new ctor private? We could do that, as both `Relocation` (the user of the ctr) and `RelocIterator` are friends. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26569#discussion_r2293130843 From epeter at openjdk.org Fri Aug 22 08:58:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 08:58:05 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v18] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <9nh_9J4rqLsBPw96_d3c3ahTlnynxWlLuq62U_DHOIU=.84295976-20b5-42ea-a5b1-d49602ca276a@github.com> On Thu, 21 Aug 2025 18:35:40 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> disable flag if not possible > > It would be nice to have code profiling tool which could show which part in code for these two cases is hot. Instead of guessing based on whole system behaviors. @vnkozlov - ? I'm now playing with replacing the fast-path with a `HaltNode` - with that a lot of lines of assembly disappear (100-200). And I'm now seeing the performance difference go away, at least for the byte case (strangely not in int case). Maybe it is code locality? Maybe the `perf stat` `tma_frontend_bound` results were misleading? ? But I'm not sure about locality either. With a sufficiently large loop iteration, the slow-loop body should eventually be cached fully. So the performance difference should fade away with larger loops. But that does not seem to be the case. Here the `HaltNode` [patch](https://github.com/user-attachments/files/21934393/patch.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3213613126 From jsjolen at openjdk.org Fri Aug 22 09:10:16 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 22 Aug 2025 09:10:16 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers [v3] In-Reply-To: References: Message-ID: > Hi, > > This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. > > Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. > > Testing: GHA Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Make constructor private ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26569/files - new: https://git.openjdk.org/jdk/pull/26569/files/e71b4924..f2a4c916 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26569&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26569&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26569.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26569/head:pull/26569 PR: https://git.openjdk.org/jdk/pull/26569 From mhaessig at openjdk.org Fri Aug 22 09:15:58 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 22 Aug 2025 09:15:58 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v5] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 06:31:11 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: >> >> - Fix test >> - Better counting in tests >> - post processing of flags and documentation > > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 363: > >> 361: final public TestFramework addCrossProductScenarios(Set... flagSets) { >> 362: TestFormat.checkAndReport(flagSets != null && Arrays.stream(flagSets).noneMatch(Objects::isNull), >> 363: "Flags must not be null"); > > What about an empty `flagSets`? Is it allowed? Do we have a test for it? It is allowed, nothing happens. However, I still added a short circuit and a test. > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 367: > >> 365: if (this.scenarioIndices != null && !this.scenarioIndices.isEmpty()) { >> 366: initIdx = this.scenarioIndices.stream().max(Comparator.comparingInt(Integer::intValue)).get() + 1; >> 367: } > > Nit: you are writing code here that allows previous scenarios, but there is no example / test for that below ;) Added a test. > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 375: > >> 373: flags.stream() // Process flags >> 374: .filter(s -> !s.isEmpty()) // Remove empty flags >> 375: .map(s -> Set.of(s.split("[ ]"))) // Split muliple flags in the same string into separate strings > > What happens if I enter `"flag_one flag_two"` with two spaces in the middle? Do I then get an empty string in the middle again? If so: move the empty string filter down. I moved it down for the sake of defensiveness. > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 394: > >> 392: Set newSet = new HashSet<>(set); >> 393: newSet.add(setElement); >> 394: return newSet; > > Not super performant, as it creates a new HashSet at every turn... but oh well we are not making this public anyway ;) In general, creating new collections in streams is a necessary evil. But I rewrote the cross product as a reduction with an ArrayList, which should be a bit more performant. > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 96: > >> 94: TestFramework t5 = new TestFramework(); >> 95: t5.addCrossProductScenarios(Set.of("", "-XX:TLABRefillWasteFraction=51", "-XX:TLABRefillWasteFraction=53"), >> 96: Set.of("-XX:+UseNewCode", "-XX:+UseNewCode2")); > > Now looking at the implementation of `addCrossProductScenarios`: what does it do when it is called without arguments/empty args array? Can you also add a test for that? Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2293170188 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2293168880 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2293168286 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2293172614 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2293173254 From jsjolen at openjdk.org Fri Aug 22 09:32:54 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 22 Aug 2025 09:32:54 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers [v3] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 09:10:16 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. >> >> Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. >> >> Testing: GHA > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Make constructor private The test failure cannot be replicated, I will wait with integrating until Monday. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26569#issuecomment-3213718246 From mhaessig at openjdk.org Fri Aug 22 09:36:45 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 22 Aug 2025 09:36:45 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v6] In-Reply-To: References: Message-ID: > This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus some internal testing on Oracle supported platforms Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Improvements prompted by Emanuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26762/files - new: https://git.openjdk.org/jdk/pull/26762/files/273e5f64..7bab7759 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=04-05 Stats: 85 lines in 2 files changed: 62 ins; 15 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/26762.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26762/head:pull/26762 PR: https://git.openjdk.org/jdk/pull/26762 From mhaessig at openjdk.org Fri Aug 22 09:36:47 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 22 Aug 2025 09:36:47 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v5] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 06:55:38 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: >> >> - Fix test >> - Better counting in tests >> - post processing of flags and documentation > > Looks much better already! I now took a closer look at the implementation :) Thank you for your detailed review, @eme64. I addressed all of your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26762#issuecomment-3213728001 From mhaessig at openjdk.org Fri Aug 22 09:49:51 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 22 Aug 2025 09:49:51 GMT Subject: RFR: 8365909: [REDO] Add a compilation timeout flag to catch long running compilations In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 11:56:17 GMT, Manuel H?ssig wrote: > This PR adds a timeout for compilation tasks based on timer signals on Linux debug builds. > > This PR is a redo of #25872 with fixes for the failing test. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus internal testing on all Oracle supproted platforms > - [x] tier3,tier4 on linux-x64-debug > - [x] tier1,tier2,tier3,tier4 on linux-x64-debug with `-XX:CompileTaskTimeout=60000` I originally ran `java -version` and checked that the string "java version" appeared in the output. Then, Christian suggested that I really should use `java --version`, which I promptly started using. However, `java --version`'s output is `java ` and does not contain "java version" making the test fail. This combined with my forgetting to run testing before integration, lead to the backout. Now, the last run of `java --version` that does not time out only checks that the exit code is 0, because I noticed that `java --version` might also output something else than "java" and checking for successful exit is sufficient to show that the test did not assert. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26882#issuecomment-3213768387 From galder at openjdk.org Fri Aug 22 11:40:10 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 22 Aug 2025 11:40:10 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v4] In-Reply-To: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: > I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. > > Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: > > > Benchmark (seed) (size) Mode Cnt Base Patch Units Diff > VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% > VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% > VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% > VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% > VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% > VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% > > > The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. > > I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: - Add more IR node positive assertions - Fix source of data for benchmarks - Refactor benchmarks to TypeVectorOperations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26457/files - new: https://git.openjdk.org/jdk/pull/26457/files/147633f9..01fd5ba0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26457&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26457&range=02-03 Stats: 203 lines in 3 files changed: 51 ins; 148 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26457.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26457/head:pull/26457 PR: https://git.openjdk.org/jdk/pull/26457 From galder at openjdk.org Fri Aug 22 11:40:10 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 22 Aug 2025 11:40:10 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3] In-Reply-To: <59dW-P8qExfEfXqud1rOPax4qGcubqi9RQxM4tJLQoQ=.dd1a3fb3-8ded-4e2d-bc25-49456e7ab46f@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> <59dW-P8qExfEfXqud1rOPax4qGcubqi9RQxM4tJLQoQ=.dd1a3fb3-8ded-4e2d-bc25-49456e7ab46f@github.com> Message-ID: On Wed, 20 Aug 2025 06:52:47 GMT, Emanuel Peter wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Check at the very least that auto vectorization is supported > > Had a quick look again and found a few more suggestions in the tests/benchmarks. > But I think the VM changes are solid :) @eme64 I've refactored the benchmarks to `TypeVectorOperations`. These are the before/after throughput numbers on darwin/aarch64: Benchmark (COUNT) (seed) Mode Cnt Base Patch Units Diff TypeVectorOperations.TypeVectorOperationsSuperWord.convertD2LBits 512 0 thrpt 8 4993.941 5127.876 ops/ms +3% TypeVectorOperations.TypeVectorOperationsSuperWord.convertD2LBits 2048 0 thrpt 8 1169.952 1179.016 ops/ms +1% TypeVectorOperations.TypeVectorOperationsSuperWord.convertD2LBitsRaw 512 0 thrpt 8 15394.034 27658.958 ops/ms +80% TypeVectorOperations.TypeVectorOperationsSuperWord.convertD2LBitsRaw 2048 0 thrpt 8 4007.795 7347.348 ops/ms +83% TypeVectorOperations.TypeVectorOperationsSuperWord.convertF2IBits 512 0 thrpt 8 5140.632 5214.131 ops/ms +1% TypeVectorOperations.TypeVectorOperationsSuperWord.convertF2IBits 2048 0 thrpt 8 1187.033 1130.995 ops/ms -5% TypeVectorOperations.TypeVectorOperationsSuperWord.convertF2IBitsRaw 512 0 thrpt 8 15874.272 54196.086 ops/ms +241% TypeVectorOperations.TypeVectorOperationsSuperWord.convertF2IBitsRaw 2048 0 thrpt 8 4020.074 15020.595 ops/ms +274% TypeVectorOperations.TypeVectorOperationsSuperWord.convertIBits2F 512 0 thrpt 8 12008.101 53389.533 ops/ms +345% TypeVectorOperations.TypeVectorOperationsSuperWord.convertIBits2F 2048 0 thrpt 8 3010.701 15001.785 ops/ms +398% TypeVectorOperations.TypeVectorOperationsSuperWord.convertLBits2D 512 0 thrpt 8 11947.581 28216.125 ops/ms +136% TypeVectorOperations.TypeVectorOperationsSuperWord.convertLBits2D 2048 0 thrpt 8 2992.392 7354.876 ops/ms +146% I've added added more positive IR node checks to `TestCompatibleUseDefTypeSize`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3214062842 From dlong at openjdk.org Fri Aug 22 12:01:59 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Aug 2025 12:01:59 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() [v2] In-Reply-To: References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Thu, 21 Aug 2025 09:01:12 GMT, Artem Semenov wrote: >> The defect has been detected and confirmed in the function ArchDesc::declareClasses() located in the file src/hotspot/share/adlc/output_h.cpp with static code analysis. This defect can potentially lead to a null pointer dereference. >> >> The pointer instr->_matrule is dereferenced in line 1952 without checking for nullptr, although earlier in line 1858 the same pointer is checked for nullptr, which indicates that it can be null. >> >> According to [this](https://github.com/openjdk/jdk/pull/26002#issuecomment-3023050372) comment, this PR contains fixes for similar cases in other places. > > Artem Semenov has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/c1/c1_LinearScan.cpp > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Update src/hotspot/share/adlc/output_h.cpp > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> Most of these mitigations to guard against a possible null pointer dereference are inside `if` expressions, which means if there was a null pointer, then we will now end up in the `else` clause, changing the behavior of the code to something that was perhaps unintended, and we still don't know what caused the null pointer. So this is just silently masking potential problems, and in my experience is usually not the correct fix. Most of the time the correct fix is to tell the static analyzer that it is a false positive and move on. Sometimes it is appropriate to add an assert or guarantee, and yes sometimes it is appropriate to do something different if there is a null, for example if it is a result of an allocation that can fail. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26798#issuecomment-3212300703 From asemenov at openjdk.org Fri Aug 22 12:02:00 2025 From: asemenov at openjdk.org (Artem Semenov) Date: Fri, 22 Aug 2025 12:02:00 GMT Subject: RFR: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() [v2] In-Reply-To: References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Thu, 21 Aug 2025 14:32:43 GMT, Andrew Dinn wrote: > Well, this leads right to the root of the problem I have with this report. As you say, pos_idx does indeed come out of a marker object. It took me about a minute to identify that this marker object is created in the function that sits right above the one your code assistant flagged as problematic -- even though I am not at all familiar with this code. It looks clear to me that, given the right call sequence for calls that create a marker and then consume it here, the check on pos_idx will ensure that we don't drop off the end of the list with a null pointer. So, it looks very liek this code has been designed so that the presence of a marker with a suitable pos_idx is intended to ensure this loop terminates before that happens. I am sure someone in this project knows whether that is the case but it is not you or your coding assistant. > > I'm not suggesting that that calling sequence is actually right and that the check for pos_idx will definitely avoid dropping off the end. Indeed, I would welcome a bug report that proved it to be wrong. However, what is clear that both you and your coding assistant have failed to appreciate how some relatively obvious parts of this design actually operate. That renders your (or your tool's) analysis a shallow and unhelpful distraction; using it as an excuse to raise a purported 'issue' in the absence of any evidence of an actual issue is very much a waste of time for this project's reviewers. > > Your error is compounded by the fact that you (or more likely your coding assistant) are suggesting changes which, because they are not founded in a correct understanding of the design, could potentially lead to worse outcomes than the speculative nullptr dereference they are intended to remedy -- as I explained when discussing your change to the control flow logic in the ALDC code. So, not only is this report unhelpful it is potentially harmful. > > Ultimately the takeaway here is that the OpenJDK bug system is not here to report, review and add patches to remedy issues that you or your code assistant tool invents on the basis of misinformed assumptions. It is here to report, review and add patches to remedy issues that can be shown to actually affect the correct operation of the JVM and JDK,either by a reproducible test or by well-reasoned argument. So, please do not continue to spam the project with bug reports like this simply because a potentially bogus patch will improve your experience with what is clearly a decidedly fallible tool. I?m sorry to have taken up your time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26798#discussion_r2293532235 From asemenov at openjdk.org Fri Aug 22 12:02:01 2025 From: asemenov at openjdk.org (Artem Semenov) Date: Fri, 22 Aug 2025 12:02:01 GMT Subject: Withdrawn: 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() In-Reply-To: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> References: <3lBcWmU_crhlwmnXaBl3ljOS87FTJ4VDZUC_kwlFC0A=.45fbea2f-4b39-4e15-a4a3-31b74c483748@github.com> Message-ID: On Fri, 15 Aug 2025 11:58:48 GMT, Artem Semenov wrote: > The defect has been detected and confirmed in the function ArchDesc::declareClasses() located in the file src/hotspot/share/adlc/output_h.cpp with static code analysis. This defect can potentially lead to a null pointer dereference. > > The pointer instr->_matrule is dereferenced in line 1952 without checking for nullptr, although earlier in line 1858 the same pointer is checked for nullptr, which indicates that it can be null. > > According to [this](https://github.com/openjdk/jdk/pull/26002#issuecomment-3023050372) comment, this PR contains fixes for similar cases in other places. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/26798 From mli at openjdk.org Fri Aug 22 12:35:12 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 22 Aug 2025 12:35:12 GMT Subject: RFR: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > > This is a follow-up of https://github.com/openjdk/jdk/pull/26838, fixes the vector version in a similar way. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: comments & readability ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26883/files - new: https://git.openjdk.org/jdk/pull/26883/files/446a403b..fa107180 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26883&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26883&range=00-01 Stats: 30 lines in 1 file changed: 24 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/26883.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26883/head:pull/26883 PR: https://git.openjdk.org/jdk/pull/26883 From epeter at openjdk.org Fri Aug 22 12:37:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 12:37:25 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v19] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > TODO work that arose during review process / recent merges with master: > > - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. > - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. > > --------------- > > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSumm... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: rm IR rule that checks multiversioning, rare cases fail due to RCE ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/8480d814..a00b385c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=17-18 Stats: 25 lines in 1 file changed: 9 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Fri Aug 22 13:04:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 13:04:58 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v20] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > TODO work that arose during review process / recent merges with master: > > - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. > - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. > > --------------- > > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSumm... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add test for related report for JDK-8359688 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/a00b385c..d718bd3f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=18-19 Stats: 98 lines in 1 file changed: 98 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Fri Aug 22 13:15:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 13:15:27 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v21] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > TODO work that arose during review process / recent merges with master: > > - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. > - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. > > --------------- > > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSumm... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add test for related report for JDK-8360204 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/d718bd3f..198bff79 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=19-20 Stats: 90 lines in 1 file changed: 90 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Fri Aug 22 13:34:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 13:34:58 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v22] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > TODO work that arose during review process / recent merges with master: > > - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. > - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. > > --------------- > > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSumm... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add test for related report for JDK-8365982 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24278/files - new: https://git.openjdk.org/jdk/pull/24278/files/198bff79..2cfe1097 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=20-21 Stats: 98 lines in 1 file changed: 98 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Fri Aug 22 13:38:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 13:38:04 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v9] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Thu, 14 Aug 2025 11:17:57 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> moved swapping up, suggested by Manuel > > Thank you for addressing my feedback! This looks good to me now. @mhaessig @vnkozlov Update: I also had to fix the `TestAliasingFuzzer.java`: I can no longer assert that there is no `multiversioning` because there are some edge-cases where we have issues. I filed bugs for those, and already integrated an IR test for each. https://bugs.openjdk.org/browse/JDK-8359688 https://bugs.openjdk.org/browse/JDK-8360204 https://bugs.openjdk.org/browse/JDK-8365982 So if anybody accidentally, or intentionally fixes those, we should come back to `TestAliasingFuzzer.java` and tighten the IR rules. Asserting that there is no `multiversioning` in the IR rules makes sure that we made the runtime check as exact as possible, and do not fail in cases where it would have been safe to keep the predicate, rather than deoptimizing and compiling with multiversioning (more compile time, more code -> just worse). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3214406740 From epeter at openjdk.org Fri Aug 22 14:10:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 14:10:54 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v6] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 09:36:45 GMT, Manuel H?ssig wrote: >> This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. >> >> Testing: >> - [x] Github Actions >> - [x] tier1,tier2 plus some internal testing on Oracle supported platforms > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Improvements prompted by Emanuel Changes requested by epeter (Reviewer). test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 387: > 385: }) > 386: ), > 387: (a, b) -> Stream.concat(a, b)); Wow, that's dense. Maybe a little comment could help here. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 53: > 51: TestFramework t = new TestFramework(); > 52: t.addCrossProductScenarios(null); > 53: Asserts.fail("Should not have thrown exception"); Should or should not have thrown? ------------- PR Review: https://git.openjdk.org/jdk/pull/26762#pullrequestreview-3144643027 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2293833619 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2293836677 From epeter at openjdk.org Fri Aug 22 14:10:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Aug 2025 14:10:55 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v6] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 14:06:40 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Improvements prompted by Emanuel > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 53: > >> 51: TestFramework t = new TestFramework(); >> 52: t.addCrossProductScenarios(null); >> 53: Asserts.fail("Should not have thrown exception"); > > Should or should not have thrown? I think you copied it wrongly from elsewhere ;) Not the only case, so check below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2293837521 From duke at openjdk.org Fri Aug 22 15:34:03 2025 From: duke at openjdk.org (duke) Date: Fri, 22 Aug 2025 15:34:03 GMT Subject: Withdrawn: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 20 May 2025 19:39:30 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > This pr is splited from https://github.com/openjdk/jdk/pull/25341, and contains only share code change. > > This patch enable the vectorization of statement like `fd_1 bop fd_2 ? res_1 : res_2` in a loop. > > The current behaviour on other platforms support vecatorization of `fd_1 bop fd_2 ? res_1 : res_2` in a loop only when `fd` and `res` have the same size, but this constraint seems not necessary at least not necessary on riscv, so I relax this constraint on riscv, maybe on other platforms it can be relaxed too, but currently I only made it work on riscv. > Besides of this, I also relax the constraint on transforming Op_CMoveI/L to Op_VectorBlend on riscv, this bring some extra benefit when the `res` is not float or double types. > Both relaxation bring performance benefit via vectorization. > > Compared with other runs (master, master with `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on, patch without flags turned on), average improvement introduced by the patch with `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on is more than 2.1 times, in some cases it can bring more than 4 times improvement. > When `-XX:-UseVectorCmov -XX:-UseCMoveUnconditionally` turned off, there is no regression on average. > > Check more details at: https://github.com/openjdk/jdk/pull/25341. > > Thanks This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/25336 From thartmann at openjdk.org Fri Aug 22 15:34:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 22 Aug 2025 15:34:54 GMT Subject: RFR: 8360561: PhaseIdealLoop::create_new_if_for_predicate hits "must be a uct if pattern" assert [v2] In-Reply-To: References: Message-ID: <0ddrLPHOeC4kl6zjesl3O_f7gYi_owC_A_pLBBNbiLk=.26d60cf9-3352-4051-8f07-33aa240710bd@github.com> On Mon, 18 Aug 2025 08:41:52 GMT, Marc Chevalier wrote: >> Did you know that ranges can be disjoints and yet not ordered?! Well, in modular arithmetic. >> >> Let's look at a simplistic example: >> >> int x; >> if (?) { >> x = -1; >> } else { >> x = 1; >> } >> >> if (x != 0) { >> return; >> } >> // Unreachable >> >> >> With signed ranges, before the second `if`, `x` is in `[-1, 1]`. Which is enough to enter to second if, but not enough to prove you have to enter it: it wrongly seems that after the second `if` is still reachable. Twaddle! >> >> With unsigned ranges, at this point `x` is in `[1, 2^32-1]`, and then, it is clear that `x != 0`. This information is used to refine the value of `x` in the (missing) else-branch, and so, after the if. This is done with simple lattice meet (Hotspot's join): in the else-branch, the possible values of `x` are the meet of what is was worth before, and the interval in the guard, that is `[0, 0]`. Thanks to the unsigned range, this is known to be empty (that is bottom, or Hotspot's top). And with a little reduced product, the whole type of `x` is empty as well. Yet, this information is not used to kill control yet. >> >> This is here the center of the problem: we have a situation such as: >> 2 after-CastII >> After node `110 CastII` is idealized, it is found to be Top, and then the uncommon trap at `129` is replaced by `238 Halt` by being value-dead. >> 1 before-CastII >> Since the control is not killed, the node stay there, eventually making some predicate-related assert fail as a trap is expected under a `ParsePredicate`. >> >> And that's what this change proposes: when comparing integers with non-ordered ranges, let's see if the unsigned ranges overlap, by computing the meet. If the intersection is empty, then the values can't be equals, without being able to order them. This is new! Without unsigned information for signed integer, either they overlap, or we can order them. Adding modular arithmetic allows to have non-overlapping ranges that are also not ordered. >> >> Let's also notice that 0 is special: it is important bounds are on each side of 0 (or 2^31, the other discontinuity). For instance if `x` can be 1 or 5, for instance, both the signed and unsigned range will agree on `[1, 5]` and not be able to prove it's, let's say, 3. > ... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Use Warmup(0) instead of Xcomp Nice analysis Marc! The fix looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26504#pullrequestreview-3144960928 From duke at openjdk.org Fri Aug 22 15:54:10 2025 From: duke at openjdk.org (Tobias Hotz) Date: Fri, 22 Aug 2025 15:54:10 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v4] In-Reply-To: References: Message-ID: > This PR improves the value of interger division nodes. > Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case > We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. > This also cleans up and unifies the code paths for DivINode and DivLNode. > I've added some tests to validate the optimization. Without the changes, some of these tests fail. Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: Simplify the special case path ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26143/files - new: https://git.openjdk.org/jdk/pull/26143/files/eef20ae6..4dc32af9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=02-03 Stats: 26 lines in 1 file changed: 1 ins; 8 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/26143.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26143/head:pull/26143 PR: https://git.openjdk.org/jdk/pull/26143 From duke at openjdk.org Fri Aug 22 16:14:34 2025 From: duke at openjdk.org (Tobias Hotz) Date: Fri, 22 Aug 2025 16:14:34 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v5] In-Reply-To: References: Message-ID: <-PH0VIqmFhoPKD3mHpEwG6sOX8GaVfL66gfd34ZGm8k=.d69a81ff-f5be-4e3b-ba05-5be2c06eaeea@github.com> > This PR improves the value of interger division nodes. > Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case > We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. > This also cleans up and unifies the code paths for DivINode and DivLNode. > I've added some tests to validate the optimization. Without the changes, some of these tests fail. Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: Fix if condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26143/files - new: https://git.openjdk.org/jdk/pull/26143/files/4dc32af9..2bf7c99d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26143.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26143/head:pull/26143 PR: https://git.openjdk.org/jdk/pull/26143 From kvn at openjdk.org Fri Aug 22 16:17:04 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 22 Aug 2025 16:17:04 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v9] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <_HNodE8lATn8RmFyPRA-PkIx7CL_K7iqrVlgiKrZKzA=.a325c80d-5363-407c-a807-d0f470a0c0e0@github.com> On Fri, 22 Aug 2025 13:34:56 GMT, Emanuel Peter wrote: >> Thank you for addressing my feedback! This looks good to me now. > > @mhaessig @vnkozlov > Update: I also had to fix the `TestAliasingFuzzer.java`: I can no longer assert that there is no `multiversioning` because there are some edge-cases where we have issues. I filed bugs for those, and already integrated an IR test for each. > https://bugs.openjdk.org/browse/JDK-8359688 > https://bugs.openjdk.org/browse/JDK-8360204 > https://bugs.openjdk.org/browse/JDK-8365982 > > So if anybody accidentally, or intentionally fixes those, we should come back to `TestAliasingFuzzer.java` and tighten the IR rules. > > Asserting that there is no `multiversioning` in the IR rules makes sure that we made the runtime check as exact as possible, and do not fail in cases where it would have been safe to keep the predicate, rather than deoptimizing and compiling with multiversioning (more compile time, more code -> just worse). > > I also filed an RFE to eventually fix the IR rules in the test `TestAliasingFuzzer.java`: > https://bugs.openjdk.org/browse/JDK-8365985 > > Note: for now, `TestAliasingFuzzer.java` still has some IR rules, but just for the `array` examples, see `generateIRRulesArray`. These should already work well with RCE. We are mostly having issues with long-address MemorySegments currently, see the filed RFE's above. @eme64 I noticed In first (no_patch, fastest) assembler we don't have "strip mining" outer loop. While in other cases we have it. Do you know why? Yes, it could be a lot of reasons we get such regression. Did you tried **reduce** unrolling of slow path. > Might it be the runtime check and related branch misprediction? Could be since you added outer loop in slow path. > tma_backend_bound: 21.3 vs 24.8 - there seems to be a bottleneck in the backend for patch of 10% This seems indicate more time spent on data access. Did main-loop starts copying from the same offset/element in no_patch vs patch loops? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3214913908 From kvn at openjdk.org Fri Aug 22 16:21:03 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 22 Aug 2025 16:21:03 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v22] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Fri, 22 Aug 2025 13:34:58 GMT, Emanuel Peter wrote: >> TODO work that arose during review process / recent merges with master: >> >> - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. >> - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. >> >> --------------- >> >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add test for related report for JDK-8365982 This looks like "rabbit hole" :( May be file a separate RFE to investigate this behavior later by some other engineer. Most concerning is that reproduced on different platforms. I agree that we may accept this regression since it happened in corner case. I assume our benchmarks are not affected by this. Right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3214924449 From kvn at openjdk.org Fri Aug 22 16:33:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 22 Aug 2025 16:33:55 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers [v3] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 09:10:16 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. >> >> Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. >> >> Testing: GHA > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Make constructor private Thank you for running more testing. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26569#pullrequestreview-3145122989 From bulasevich at openjdk.org Fri Aug 22 16:34:04 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 22 Aug 2025 16:34:04 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' Message-ID: This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. The problems is that shift count `n` may be too large here: class Pipeline_Use_Cycle_Mask { protected: uint _mask; .. Pipeline_Use_Cycle_Mask& operator<<=(int n) { _mask <<= n; return *this; } }; The recent change attempted to cap the shift amount at one call site: class Pipeline_Use_Element { protected: .. // Mask of specific used cycles Pipeline_Use_Cycle_Mask _mask; .. void step(uint cycles) { _used = 0; uint max_shift = 8 * sizeof(_mask) - 1; _mask <<= (cycles < max_shift) ? cycles : max_shift; } } However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: // The following two routines assume that the root Pipeline_Use entity // consists of exactly 1 element for each functional unit // start is relative to the current cycle; used for latency-based info uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { for (uint i = 0; i < pred._count; i++) { const Pipeline_Use_Element *predUse = pred.element(i); if (predUse->_multiple) { uint min_delay = 7; // Multiple possible functional units, choose first unused one for (uint j = predUse->_lb; j <= predUse->_ub; j++) { const Pipeline_Use_Element *currUse = element(j); uint curr_delay = delay; if (predUse->_used & currUse->_used) { Pipeline_Use_Cycle_Mask x = predUse->_mask; Pipeline_Use_Cycle_Mask y = currUse->_mask; for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) y <<= 1; } if (min_delay > curr_delay) min_delay = curr_delay; } if (delay < min_delay) delay = min_delay; } else { for (uint j = predUse->_lb; j <= predUse->_ub; j++) { const Pipeline_Use_Element *currUse = element(j); if (predUse->_used & currUse->_used) { Pipeline_Use_Cycle_Mask x = predUse->_mask; Pipeline_Use_Cycle_Mask y = currUse->_mask; > for ( y <<= delay; x.overlaps(y); delay++ ) y <<= 1; } } } } return (delay); } **Fix:** cap the shift **inside** `Pipeline_Use_Cycle_Mask::operator<<=` so all call sites are safe: class Pipeline_Use_Cycle_Mask { protected: uint _mask; .. Pipeline_Use_Cycle_Mask& operator<<=(int n) { int max_shift = 8 * sizeof(_mask) - 1; _mask <<= (n < max_shift) ? n : max_shift; return *this; } }; class Pipeline_Use_Element { protected: .. // Mask of specific used cycles Pipeline_Use_Cycle_Mask _mask; .. void step(uint cycles) { _used = 0; _mask <<= cycles; } } Note: on platforms where PipelineForm::_maxcycleused > 32 (e.g., ARM32), the Pipeline_Use_Cycle_Mask implementation already handles large shifts, so no additional check is needed: class Pipeline_Use_Cycle_Mask { protected: uint _mask1, _mask2, _mask3; Pipeline_Use_Cycle_Mask& operator<<=(int n) { if (n >= 32) do { _mask3 = _mask2; _mask2 = _mask1; _mask1 = 0; } while ((n -= 32) >= 32); if (n > 0) { uint m = 32 - n; uint mask = (1 << n) - 1; uint temp2 = mask & (_mask1 >> m); _mask1 <<= n; uint temp3 = mask & (_mask2 >> m); _mask2 <<= n; _mask2 |= temp2; _mask3 <<= n; _mask3 |= temp3; } return *this; } } ------------- Commit messages: - 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' Changes: https://git.openjdk.org/jdk/pull/26890/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26890&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338197 Stats: 4 lines in 1 file changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26890/head:pull/26890 PR: https://git.openjdk.org/jdk/pull/26890 From kvn at openjdk.org Fri Aug 22 16:38:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 22 Aug 2025 16:38:55 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v6] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: On Fri, 22 Aug 2025 03:21:36 GMT, Igor Veresov wrote: >> This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > More renames src/hotspot/share/compiler/compilationPolicy.cpp line 192: > 190: > 191: void CompilationPolicy::replay_training_at_init_loop(JavaThread* current) { > 192: while (!CompileBroker::is_compilation_disabled_forever() || AOTVerifyTrainingData) { Will it loop forever with `+ AOTVerifyTrainingData` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2294183959 From kvn at openjdk.org Fri Aug 22 16:49:02 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 22 Aug 2025 16:49:02 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 00:47:48 GMT, Boris Ulasevich wrote: > This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. > > The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. > > This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. > > The problems is that shift count `n` may be too large here: > > class Pipeline_Use_Cycle_Mask { > protected: > uint _mask; > .. > Pipeline_Use_Cycle_Mask& operator<<=(int n) { > _mask <<= n; > return *this; > } > }; > > The recent change attempted to cap the shift amount at one call site: > > class Pipeline_Use_Element { > protected: > .. > // Mask of specific used cycles > Pipeline_Use_Cycle_Mask _mask; > .. > void step(uint cycles) { > _used = 0; > uint max_shift = 8 * sizeof(_mask) - 1; > _mask <<= (cycles < max_shift) ? cycles : max_shift; > } > } > > However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: > > // The following two routines assume that the root Pipeline_Use entity > // consists of exactly 1 element for each functional unit > // start is relative to the current cycle; used for latency-based info > uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { > for (uint i = 0; i < pred._count; i++) { > const Pipeline_Use_Element *predUse = pred.element(i); > if (predUse->_multiple) { > uint min_delay = 7; > // Multiple possible functional units, choose first unused one > for (uint j = predUse->_lb; j <= predUse->_ub; j++) { > const Pipeline_Use_Element *currUse = element(j); > uint curr_delay = delay; > if (predUse->_used & currUse->_used) { > Pipeline_Use_Cycle_Mask x = predUse->_mask; > Pipeline_Use_Cycle_Mask y = currUse->_mask; > > for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) > y <<= 1; > } > if (min_delay > curr_delay) > min_delay = curr_delay; > } > if (delay < min_delay) > delay = min_delay; > } > else { > for (uint j = predUse->_lb; j <= predUse->_ub; j++) { > const Pipeline_Use_Element *currUse = element(j); > if (predUse->_used & currUse->_used) { > ... src/hotspot/share/adlc/output_h.cpp line 774: > 772: fprintf(fp_hpp, " Pipeline_Use_Cycle_Mask& operator<<=(int n) {\n"); > 773: fprintf(fp_hpp, " int max_shift = 8 * sizeof(_mask) - 1;\n"); > 774: fprintf(fp_hpp, " _mask <<= (n < max_shift) ? n : max_shift;\n"); sizeof(_mask) is know - it is sizeof(uint). Lines 760-768 should be cleaned: ` <= 32` checks are redundant because of check at line 758. This is leftover from SPARC code (not clean) removal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26890#discussion_r2294201108 From duke at openjdk.org Fri Aug 22 17:40:01 2025 From: duke at openjdk.org (Francesco Andreuzzi) Date: Fri, 22 Aug 2025 17:40:01 GMT Subject: Integrated: 8365829: Multiple definitions of static 'phase_names' In-Reply-To: References: Message-ID: On Wed, 20 Aug 2025 01:12:36 GMT, Francesco Andreuzzi wrote: > - `opto/phasetype.hpp` defines `static const char* phase_names[]` > - `compiler/compilerEvent.cpp` defines `static GrowableArray* phase_names` > > This is not a problem when the two files are compiled as different translation units, but it causes a build failure if any of them is pulled in by a precompiled header: > > > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:59:36: error: redefinition of 'phase_names' with a different type: 'GrowableArray *' vs 'const char *[100]' > 59 | static GrowableArray* phase_names = nullptr; > | ^ > /jdk/src/hotspot/share/opto/phasetype.hpp:147:20: note: previous definition is here > 147 | static const char* phase_names[] = { > | ^ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:67:39: error: member reference base type 'const char *' is not a structure or union > 67 | const u4 nof_entries = phase_names->length(); > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:71:31: error: member reference base type 'const char *' is not a structure or union > 71 | writer.write(phase_names->at(i)); > | ~~~~~~~~~~~^ ~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:77:34: error: member reference base type 'const char *' is not a structure or union > 77 | for (int i = 0; i < phase_names->length(); i++) { > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:78:35: error: member reference base type 'const char *' is not a structure or union > 78 | const char* name = phase_names->at(i); > | ~~~~~~~~~~~^ ~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:91:9: error: comparison of array 'phase_names' equal to a null pointer is always false [-Werror,-Wtautological-pointer-compare] > 91 | if (phase_names == nullptr) { > | ^~~~~~~~~~~ ~~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:92:19: error: array type 'const char *[100]' is not assignable > 92 | phase_names = new (mtInternal) GrowableArray(100, mtCompiler); > | ~~~~~~~~~~~ ^ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:103:24: error: member reference base type 'const char *' is not a structure or union > 103 | index = phase_names->length(); > | ~~~~~~~~~~~^ ~~~~~~ > /jdk/src/hotspot/share/compiler/compilerEvent.cpp:104:16: error: member reference base type 'const char *' is not a structure or union > 104 | phase_names->append(use_strdup ? os::strdup(phase_name) : phase_name); > | ~~~~~~~~~~~^ ~~~~~~ > 9 errors generated. > > > Passes `tier1`. This pull request has now been integrated. Changeset: 19882220 Author: Francesco Andreuzzi Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/19882220ecb3eeaef763ccbb0aa4d7760c906222 Stats: 75 lines in 2 files changed: 50 ins; 19 del; 6 mod 8365829: Multiple definitions of static 'phase_names' Reviewed-by: kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/26851 From mhaessig at openjdk.org Fri Aug 22 18:05:57 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 22 Aug 2025 18:05:57 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v7] In-Reply-To: References: Message-ID: > This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus some internal testing on Oracle supported platforms Manuel H?ssig has updated the pull request incrementally with four additional commits since the last revision: - Remove excess newline - Fix indentation - Improve comments - Fix copy pasta mistakes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26762/files - new: https://git.openjdk.org/jdk/pull/26762/files/7bab7759..2e36929f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=05-06 Stats: 23 lines in 2 files changed: 0 ins; 1 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/26762.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26762/head:pull/26762 PR: https://git.openjdk.org/jdk/pull/26762 From mhaessig at openjdk.org Fri Aug 22 18:05:59 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 22 Aug 2025 18:05:59 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v6] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 14:05:20 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Improvements prompted by Emanuel > > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 387: > >> 385: }) >> 386: ), >> 387: (a, b) -> Stream.concat(a, b)); > > Wow, that's dense. Maybe a little comment could help here. It ended up being a lottle comment, but I think it does the job. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2294342803 From mhaessig at openjdk.org Fri Aug 22 18:06:00 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 22 Aug 2025 18:06:00 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v6] In-Reply-To: References: Message-ID: <4xpV49gPUS6vRPp-8op_V40VG3GddWzOpCOgx_vljjk=.50245728-150d-48be-8758-4e071f1c50cd@github.com> On Fri, 22 Aug 2025 14:07:05 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 53: >> >>> 51: TestFramework t = new TestFramework(); >>> 52: t.addCrossProductScenarios(null); >>> 53: Asserts.fail("Should not have thrown exception"); >> >> Should or should not have thrown? > > I think you copied it wrongly from elsewhere ;) > Not the only case, so check below. Indeed copy paste without editing. Should be fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2294345338 From mhaessig at openjdk.org Fri Aug 22 18:09:14 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 22 Aug 2025 18:09:14 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v8] In-Reply-To: References: Message-ID: > This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus some internal testing on Oracle supported platforms Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Merge branch 'master' into JDK-8365262 - Remove excess newline - Fix indentation - Improve comments - Fix copy pasta mistakes - Improvements prompted by Emanuel - Fix test - Better counting in tests - post processing of flags and documentation - Make the test work - ... and 5 more: https://git.openjdk.org/jdk/compare/e9d43624...771924f0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26762/files - new: https://git.openjdk.org/jdk/pull/26762/files/2e36929f..771924f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=06-07 Stats: 17729 lines in 623 files changed: 9992 ins; 5396 del; 2341 mod Patch: https://git.openjdk.org/jdk/pull/26762.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26762/head:pull/26762 PR: https://git.openjdk.org/jdk/pull/26762 From mhaessig at openjdk.org Fri Aug 22 18:09:55 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 22 Aug 2025 18:09:55 GMT Subject: RFR: 8360561: PhaseIdealLoop::create_new_if_for_predicate hits "must be a uct if pattern" assert [v2] In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 08:41:52 GMT, Marc Chevalier wrote: >> Did you know that ranges can be disjoints and yet not ordered?! Well, in modular arithmetic. >> >> Let's look at a simplistic example: >> >> int x; >> if (?) { >> x = -1; >> } else { >> x = 1; >> } >> >> if (x != 0) { >> return; >> } >> // Unreachable >> >> >> With signed ranges, before the second `if`, `x` is in `[-1, 1]`. Which is enough to enter to second if, but not enough to prove you have to enter it: it wrongly seems that after the second `if` is still reachable. Twaddle! >> >> With unsigned ranges, at this point `x` is in `[1, 2^32-1]`, and then, it is clear that `x != 0`. This information is used to refine the value of `x` in the (missing) else-branch, and so, after the if. This is done with simple lattice meet (Hotspot's join): in the else-branch, the possible values of `x` are the meet of what is was worth before, and the interval in the guard, that is `[0, 0]`. Thanks to the unsigned range, this is known to be empty (that is bottom, or Hotspot's top). And with a little reduced product, the whole type of `x` is empty as well. Yet, this information is not used to kill control yet. >> >> This is here the center of the problem: we have a situation such as: >> 2 after-CastII >> After node `110 CastII` is idealized, it is found to be Top, and then the uncommon trap at `129` is replaced by `238 Halt` by being value-dead. >> 1 before-CastII >> Since the control is not killed, the node stay there, eventually making some predicate-related assert fail as a trap is expected under a `ParsePredicate`. >> >> And that's what this change proposes: when comparing integers with non-ordered ranges, let's see if the unsigned ranges overlap, by computing the meet. If the intersection is empty, then the values can't be equals, without being able to order them. This is new! Without unsigned information for signed integer, either they overlap, or we can order them. Adding modular arithmetic allows to have non-overlapping ranges that are also not ordered. >> >> Let's also notice that 0 is special: it is important bounds are on each side of 0 (or 2^31, the other discontinuity). For instance if `x` can be 1 or 5, for instance, both the signed and unsigned range will agree on `[1, 5]` and not be able to prove it's, let's say, 3. > ... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Use Warmup(0) instead of Xcomp Marked as reviewed by mhaessig (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26504#pullrequestreview-3145395499 From bulasevich at openjdk.org Fri Aug 22 19:18:10 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 22 Aug 2025 19:18:10 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v2] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 16:46:17 GMT, Vladimir Kozlov wrote: >> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: >> >> remove redundant code > > src/hotspot/share/adlc/output_h.cpp line 774: > >> 772: fprintf(fp_hpp, " Pipeline_Use_Cycle_Mask& operator<<=(int n) {\n"); >> 773: fprintf(fp_hpp, " int max_shift = 8 * sizeof(_mask) - 1;\n"); >> 774: fprintf(fp_hpp, " _mask <<= (n < max_shift) ? n : max_shift;\n"); > > sizeof(_mask) is know - it is sizeof(uint). > Lines 760-768 should be cleaned: ` <= 32` checks are redundant because of check at line 758. This is leftover from SPARC code (not clean) removal. Good point - I removed the redundant code. As for `sizeof(_mask)`, shouldn?t it just be `max_shift = 31` or `_mask <<= (n < 32) ? n : 31;`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26890#discussion_r2294482253 From bulasevich at openjdk.org Fri Aug 22 19:18:09 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 22 Aug 2025 19:18:09 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v2] In-Reply-To: References: Message-ID: > This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. > > The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. > > This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. > > The problems is that shift count `n` may be too large here: > > class Pipeline_Use_Cycle_Mask { > protected: > uint _mask; > .. > Pipeline_Use_Cycle_Mask& operator<<=(int n) { > _mask <<= n; > return *this; > } > }; > > The recent change attempted to cap the shift amount at one call site: > > class Pipeline_Use_Element { > protected: > .. > // Mask of specific used cycles > Pipeline_Use_Cycle_Mask _mask; > .. > void step(uint cycles) { > _used = 0; > uint max_shift = 8 * sizeof(_mask) - 1; > _mask <<= (cycles < max_shift) ? cycles : max_shift; > } > } > > However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: > > // The following two routines assume that the root Pipeline_Use entity > // consists of exactly 1 element for each functional unit > // start is relative to the current cycle; used for latency-based info > uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { > for (uint i = 0; i < pred._count; i++) { > const Pipeline_Use_Element *predUse = pred.element(i); > if (predUse->_multiple) { > uint min_delay = 7; > // Multiple possible functional units, choose first unused one > for (uint j = predUse->_lb; j <= predUse->_ub; j++) { > const Pipeline_Use_Element *currUse = element(j); > uint curr_delay = delay; > if (predUse->_used & currUse->_used) { > Pipeline_Use_Cycle_Mask x = predUse->_mask; > Pipeline_Use_Cycle_Mask y = currUse->_mask; > > for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) > y <<= 1; > } > if (min_delay > curr_delay) > min_delay = curr_delay; > } > if (delay < min_delay) > delay = min_delay; > } > else { > for (uint j = predUse->_lb; j <= predUse->_ub; j++) { > const Pipeline_Use_Element *currUse = element(j); > if (predUse->_used & currUse->_used) { > ... Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: remove redundant code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26890/files - new: https://git.openjdk.org/jdk/pull/26890/files/7e8c282d..389a9dab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26890&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26890&range=00-01 Stats: 7 lines in 1 file changed: 0 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26890/head:pull/26890 PR: https://git.openjdk.org/jdk/pull/26890 From dlong at openjdk.org Fri Aug 22 19:21:52 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Aug 2025 19:21:52 GMT Subject: RFR: 8365909: [REDO] Add a compilation timeout flag to catch long running compilations In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 11:56:17 GMT, Manuel H?ssig wrote: > This PR adds a timeout for compilation tasks based on timer signals on Linux debug builds. > > This PR is a redo of #25872 with fixes for the failing test. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus internal testing on all Oracle supproted platforms > - [x] tier3,tier4 on linux-x64-debug > - [x] tier1,tier2,tier3,tier4 on linux-x64-debug with `-XX:CompileTaskTimeout=60000` Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26882#pullrequestreview-3145578364 From dlong at openjdk.org Fri Aug 22 19:27:52 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Aug 2025 19:27:52 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers [v3] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 09:10:16 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. >> >> Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. >> >> Testing: GHA > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Make constructor private Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26569#pullrequestreview-3145595926 From iveresov at openjdk.org Fri Aug 22 20:22:51 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 22 Aug 2025 20:22:51 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v6] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: <7zawxaIMLdnM5VraQwvZL3wcj3v8vYtzEvJpWYwQLqg=.eecc2aa6-f47e-44b8-842b-10621e83c2ae@github.com> On Fri, 22 Aug 2025 16:35:48 GMT, Vladimir Kozlov wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> More renames > > src/hotspot/share/compiler/compilationPolicy.cpp line 192: > >> 190: >> 191: void CompilationPolicy::replay_training_at_init_loop(JavaThread* current) { >> 192: while (!CompileBroker::is_compilation_disabled_forever() || AOTVerifyTrainingData) { > > Will it loop forever with `+ AOTVerifyTrainingData` ? Yes, it runs in a dedicated thread. It doesn't need to terminate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2294648538 From iveresov at openjdk.org Fri Aug 22 20:29:10 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 22 Aug 2025 20:29:10 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v7] In-Reply-To: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: > This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: One more nit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26866/files - new: https://git.openjdk.org/jdk/pull/26866/files/f7d6a4e0..c33d94bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26866&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26866&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26866.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26866/head:pull/26866 PR: https://git.openjdk.org/jdk/pull/26866 From chagedorn at openjdk.org Fri Aug 22 21:25:51 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 22 Aug 2025 21:25:51 GMT Subject: RFR: 8365909: [REDO] Add a compilation timeout flag to catch long running compilations In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 11:56:17 GMT, Manuel H?ssig wrote: > This PR adds a timeout for compilation tasks based on timer signals on Linux debug builds. > > This PR is a redo of #25872 with fixes for the failing test. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus internal testing on all Oracle supproted platforms > - [x] tier3,tier4 on linux-x64-debug > - [x] tier1,tier2,tier3,tier4 on linux-x64-debug with `-XX:CompileTaskTimeout=60000` Looks good to me, too! src/hotspot/os/linux/compilerThreadTimeout_linux.cpp line 105: > 103: #else > 104: sev._sigev_un._tid = thread->osthread()->thread_id(); > 105: #endif // MUSL_LIBC The `ifdef` should probably also be without indentation like the other `ifdefs`: Suggestion: #ifdef MUSL_LIBC sev.sigev_notify_thread_id = thread->osthread()->thread_id(); #else sev._sigev_un._tid = thread->osthread()->thread_id(); #endif // MUSL_LIBC ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26882#pullrequestreview-3146112274 PR Review Comment: https://git.openjdk.org/jdk/pull/26882#discussion_r2294770931 From dlong at openjdk.org Fri Aug 22 21:40:53 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Aug 2025 21:40:53 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v2] In-Reply-To: References: Message-ID: <2-L6_-hx_L2fYz4i-vuAIT-6qMEoocJhuQgLIRUSOZM=.72a86109-97cc-420b-89f7-87cf9bb83c0b@github.com> On Fri, 22 Aug 2025 19:18:09 GMT, Boris Ulasevich wrote: >> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. >> >> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. >> >> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. >> >> The problems is that shift count `n` may be too large here: >> >> class Pipeline_Use_Cycle_Mask { >> protected: >> uint _mask; >> .. >> Pipeline_Use_Cycle_Mask& operator<<=(int n) { >> _mask <<= n; >> return *this; >> } >> }; >> >> The recent change attempted to cap the shift amount at one call site: >> >> class Pipeline_Use_Element { >> protected: >> .. >> // Mask of specific used cycles >> Pipeline_Use_Cycle_Mask _mask; >> .. >> void step(uint cycles) { >> _used = 0; >> uint max_shift = 8 * sizeof(_mask) - 1; >> _mask <<= (cycles < max_shift) ? cycles : max_shift; >> } >> } >> >> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: >> >> // The following two routines assume that the root Pipeline_Use entity >> // consists of exactly 1 element for each functional unit >> // start is relative to the current cycle; used for latency-based info >> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { >> for (uint i = 0; i < pred._count; i++) { >> const Pipeline_Use_Element *predUse = pred.element(i); >> if (predUse->_multiple) { >> uint min_delay = 7; >> // Multiple possible functional units, choose first unused one >> for (uint j = predUse->_lb; j <= predUse->_ub; j++) { >> const Pipeline_Use_Element *currUse = element(j); >> uint curr_delay = delay; >> if (predUse->_used & currUse->_used) { >> Pipeline_Use_Cycle_Mask x = predUse->_mask; >> Pipeline_Use_Cycle_Mask y = currUse->_mask; >> >> for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) >> y <<= 1; >> } >> if (min_delay > curr_delay) >> min_delay = curr_delay; >> } >> if (delay < min_delay) >> delay = min_delay; >> } >> else { >> for (uint j = predUse->_lb; j <= pre... > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > remove redundant code I didn't realize we already had code to handle masks for large shifts. So I think the main problem is that _maxcycleused is not being set to the max value of 100. There is a secondary problem that we don't really need values that high, if the units are in pipeline stages. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3215730192 From kvn at openjdk.org Fri Aug 22 22:33:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 22 Aug 2025 22:33:51 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v2] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 19:15:08 GMT, Boris Ulasevich wrote: >> src/hotspot/share/adlc/output_h.cpp line 774: >> >>> 772: fprintf(fp_hpp, " Pipeline_Use_Cycle_Mask& operator<<=(int n) {\n"); >>> 773: fprintf(fp_hpp, " int max_shift = 8 * sizeof(_mask) - 1;\n"); >>> 774: fprintf(fp_hpp, " _mask <<= (n < max_shift) ? n : max_shift;\n"); >> >> sizeof(_mask) is know - it is sizeof(uint). >> Lines 760-768 should be cleaned: ` <= 32` checks are redundant because of check at line 758. This is leftover from SPARC code (not clean) removal. > > Good point - I removed the redundant code. > > As for `sizeof(_mask)`, shouldn?t it just be `max_shift = 31` or `_mask <<= (n < 32) ? n : 31;`? Yes, if `sizeof(uint)` is 32 bits on all our platforms. Hmm, may be we should use `uint32_t` for `_mask` here. Then we can use 32 and 31 without confusion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26890#discussion_r2294867624 From kvn at openjdk.org Fri Aug 22 22:37:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 22 Aug 2025 22:37:52 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v6] In-Reply-To: <7zawxaIMLdnM5VraQwvZL3wcj3v8vYtzEvJpWYwQLqg=.eecc2aa6-f47e-44b8-842b-10621e83c2ae@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <7zawxaIMLdnM5VraQwvZL3wcj3v8vYtzEvJpWYwQLqg=.eecc2aa6-f47e-44b8-842b-10621e83c2ae@github.com> Message-ID: On Fri, 22 Aug 2025 20:20:25 GMT, Igor Veresov wrote: >> src/hotspot/share/compiler/compilationPolicy.cpp line 192: >> >>> 190: >>> 191: void CompilationPolicy::replay_training_at_init_loop(JavaThread* current) { >>> 192: while (!CompileBroker::is_compilation_disabled_forever() || AOTVerifyTrainingData) { >> >> Will it loop forever with `+ AOTVerifyTrainingData` ? > > Yes, it runs in a dedicated thread. It doesn't need to terminate. Add comment about this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2294873607 From kvn at openjdk.org Fri Aug 22 22:46:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 22 Aug 2025 22:46:50 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v2] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 22:31:19 GMT, Vladimir Kozlov wrote: >> Good point - I removed the redundant code. >> >> As for `sizeof(_mask)`, shouldn?t it just be `max_shift = 31` or `_mask <<= (n < 32) ? n : 31;`? > > Yes, if `sizeof(uint)` is 32 bits on all our platforms. > > Hmm, may be we should use `uint32_t` for `_mask` here. Then we can use 32 and 31 without confusion. I mean to use `_mask <<= (n < 32) ? n : 31;` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26890#discussion_r2294888148 From duke at openjdk.org Fri Aug 22 23:35:45 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 22 Aug 2025 23:35:45 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v43] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality > > Additional Testing: > - [x] Linux x64 fastdebug tier 1/2/3/4 > - [x] Linux aarch64 fastdebug tier 1/2/3/4 Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/24c35689..3344a72a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=41-42 Stats: 21 lines in 3 files changed: 12 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Fri Aug 22 23:35:49 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 22 Aug 2025 23:35:49 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v42] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 12:26:25 GMT, Erik ?sterlund wrote: >> Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: >> >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Lock nmethod::relocate behind experimental flag >> - Use CompiledICLocker instead of CompiledIC_lock >> - Fix spacing >> - Update NMethod.java with immutable data changes >> - Rename method to nm >> - Add assert before freeing immutable data >> - Reorder is_relocatable checks >> - Require caller to hold locks >> - Revert is_always_within_branch_range changes >> - ... and 97 more: https://git.openjdk.org/jdk/compare/9593730a...24c35689 > > src/hotspot/share/prims/whitebox.cpp line 1659: > >> 1657: ResourceMark rm(THREAD); >> 1658: CHECK_JNI_EXCEPTION(env); >> 1659: nmethod* code = (nmethod*) addr; > > Hmm this might corrupt the code heap and cause crashes. The nmethod could have been freed and had something random else allocated across the same memory, and then casted nmethod even though it is some random instructions there now. Can't really do that. I added a check to verify that the address points to a valid nmethod ([source](https://github.com/chadrako/jdk/blob/3344a72ab00134b796805ec217f155e26a7c843a/src/hotspot/share/prims/whitebox.cpp#L1656-L1678)) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2294932547 From duke at openjdk.org Sat Aug 23 09:05:10 2025 From: duke at openjdk.org (Tobias Hotz) Date: Sat, 23 Aug 2025 09:05:10 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v6] In-Reply-To: References: Message-ID: > This PR improves the value of interger division nodes. > Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case > We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. > This also cleans up and unifies the code paths for DivINode and DivLNode. > I've added some tests to validate the optimization. Without the changes, some of these tests fail. Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: Remove too strict assert from old code path ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26143/files - new: https://git.openjdk.org/jdk/pull/26143/files/2bf7c99d..bb9151e4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26143.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26143/head:pull/26143 PR: https://git.openjdk.org/jdk/pull/26143 From fyang at openjdk.org Mon Aug 25 01:58:01 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 25 Aug 2025 01:58:01 GMT Subject: RFR: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way [v2] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 12:35:12 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> This is a follow-up of https://github.com/openjdk/jdk/pull/26838, fixes the vector version in a similar way. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > comments & readability Overall looks fine to me. I have a question about the test change. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2495: > 2493: __ bind(stub.entry()); > 2494: > 2495: // mul is already set to mf2 in float_to_float16_v. Although not directly related, can you rename `tmp` to `vtmp` and add an assertion about the three vector regiters (just like we do in `C2_MacroAssembler::float_to_float16_v`)? And it would help if we add some extra code comment about `v0` mask register which indicates which elements are NaNs. Or maybe better to pass `v0` as well? What I mean is something like: assert_different_registers(dst, src, vtmp); // Active elements (NaNs) are marked in v0 mask register // and mul is already set to mf2 in float_to_float16_v. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2517: > 2515: const int fp16_mantissa_bits = 10; > 2516: > 2517: // preserve the sign bit and exponent. Suggestion: `// preserve the sign bit and exponent, clear mantissa.` test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVectorNaN.java line 92: > 90: // Setup > 91: for (int i = 0; i < ARRLEN; i++) { > 92: if (i%3 == 0) { Question: What is this change for? Do you have more details? ------------- PR Review: https://git.openjdk.org/jdk/pull/26883#pullrequestreview-3149218656 PR Review Comment: https://git.openjdk.org/jdk/pull/26883#discussion_r2296547748 PR Review Comment: https://git.openjdk.org/jdk/pull/26883#discussion_r2296550895 PR Review Comment: https://git.openjdk.org/jdk/pull/26883#discussion_r2296660079 From wenanjian at openjdk.org Mon Aug 25 03:51:07 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 25 Aug 2025 03:51:07 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v3] In-Reply-To: References: Message-ID: <_2GD6G4L__UBychjUd_afVU4IYhEQWzCqQB-rPe5jkY=.5187f71e-7865-462c-a3d6-6438c224081a@github.com> > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: change some name and format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/d7ddad6e..f3698f37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=01-02 Stats: 39 lines in 1 file changed: 1 ins; 0 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From mchevalier at openjdk.org Mon Aug 25 06:54:06 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 25 Aug 2025 06:54:06 GMT Subject: RFR: 8360561: PhaseIdealLoop::create_new_if_for_predicate hits "must be a uct if pattern" assert [v2] In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 08:41:52 GMT, Marc Chevalier wrote: >> Did you know that ranges can be disjoints and yet not ordered?! Well, in modular arithmetic. >> >> Let's look at a simplistic example: >> >> int x; >> if (?) { >> x = -1; >> } else { >> x = 1; >> } >> >> if (x != 0) { >> return; >> } >> // Unreachable >> >> >> With signed ranges, before the second `if`, `x` is in `[-1, 1]`. Which is enough to enter to second if, but not enough to prove you have to enter it: it wrongly seems that after the second `if` is still reachable. Twaddle! >> >> With unsigned ranges, at this point `x` is in `[1, 2^32-1]`, and then, it is clear that `x != 0`. This information is used to refine the value of `x` in the (missing) else-branch, and so, after the if. This is done with simple lattice meet (Hotspot's join): in the else-branch, the possible values of `x` are the meet of what is was worth before, and the interval in the guard, that is `[0, 0]`. Thanks to the unsigned range, this is known to be empty (that is bottom, or Hotspot's top). And with a little reduced product, the whole type of `x` is empty as well. Yet, this information is not used to kill control yet. >> >> This is here the center of the problem: we have a situation such as: >> 2 after-CastII >> After node `110 CastII` is idealized, it is found to be Top, and then the uncommon trap at `129` is replaced by `238 Halt` by being value-dead. >> 1 before-CastII >> Since the control is not killed, the node stay there, eventually making some predicate-related assert fail as a trap is expected under a `ParsePredicate`. >> >> And that's what this change proposes: when comparing integers with non-ordered ranges, let's see if the unsigned ranges overlap, by computing the meet. If the intersection is empty, then the values can't be equals, without being able to order them. This is new! Without unsigned information for signed integer, either they overlap, or we can order them. Adding modular arithmetic allows to have non-overlapping ranges that are also not ordered. >> >> Let's also notice that 0 is special: it is important bounds are on each side of 0 (or 2^31, the other discontinuity). For instance if `x` can be 1 or 5, for instance, both the signed and unsigned range will agree on `[1, 5]` and not be able to prove it's, let's say, 3. > ... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Use Warmup(0) instead of Xcomp Thanks all for reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26504#issuecomment-3219045613 From mchevalier at openjdk.org Mon Aug 25 06:54:08 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 25 Aug 2025 06:54:08 GMT Subject: Integrated: 8360561: PhaseIdealLoop::create_new_if_for_predicate hits "must be a uct if pattern" assert In-Reply-To: References: Message-ID: <7NtlNPuS2Qocpz7OVK8rDtLp9ixtedASUFqTHMlKOF8=.b076a760-5e64-4056-b391-e698a75dd57e@github.com> On Mon, 28 Jul 2025 12:31:49 GMT, Marc Chevalier wrote: > Did you know that ranges can be disjoints and yet not ordered?! Well, in modular arithmetic. > > Let's look at a simplistic example: > > int x; > if (?) { > x = -1; > } else { > x = 1; > } > > if (x != 0) { > return; > } > // Unreachable > > > With signed ranges, before the second `if`, `x` is in `[-1, 1]`. Which is enough to enter to second if, but not enough to prove you have to enter it: it wrongly seems that after the second `if` is still reachable. Twaddle! > > With unsigned ranges, at this point `x` is in `[1, 2^32-1]`, and then, it is clear that `x != 0`. This information is used to refine the value of `x` in the (missing) else-branch, and so, after the if. This is done with simple lattice meet (Hotspot's join): in the else-branch, the possible values of `x` are the meet of what is was worth before, and the interval in the guard, that is `[0, 0]`. Thanks to the unsigned range, this is known to be empty (that is bottom, or Hotspot's top). And with a little reduced product, the whole type of `x` is empty as well. Yet, this information is not used to kill control yet. > > This is here the center of the problem: we have a situation such as: > 2 after-CastII > After node `110 CastII` is idealized, it is found to be Top, and then the uncommon trap at `129` is replaced by `238 Halt` by being value-dead. > 1 before-CastII > Since the control is not killed, the node stay there, eventually making some predicate-related assert fail as a trap is expected under a `ParsePredicate`. > > And that's what this change proposes: when comparing integers with non-ordered ranges, let's see if the unsigned ranges overlap, by computing the meet. If the intersection is empty, then the values can't be equals, without being able to order them. This is new! Without unsigned information for signed integer, either they overlap, or we can order them. Adding modular arithmetic allows to have non-overlapping ranges that are also not ordered. > > Let's also notice that 0 is special: it is important bounds are on each side of 0 (or 2^31, the other discontinuity). For instance if `x` can be 1 or 5, for instance, both the signed and unsigned range will agree on `[1, 5]` and not be able to prove it's, let's say, 3. > > What would there be other ways to treat this problem a bit ... This pull request has now been integrated. Changeset: 1f0dfdbc Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/1f0dfdbccac4d23c00cab5663324c965141e1b23 Stats: 228 lines in 4 files changed: 228 ins; 0 del; 0 mod 8360561: PhaseIdealLoop::create_new_if_for_predicate hits "must be a uct if pattern" assert Reviewed-by: mhaessig, thartmann, qamai ------------- PR: https://git.openjdk.org/jdk/pull/26504 From galder at openjdk.org Mon Aug 25 07:13:43 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 25 Aug 2025 07:13:43 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v5] In-Reply-To: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: > I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. > > Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: > > > Benchmark (seed) (size) Mode Cnt Base Patch Units Diff > VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% > VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% > VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% > VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% > VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% > VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% > > > The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. > > I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: - Merge branch 'master' into topic.fp-bits-vector - Add more IR node positive assertions - Fix source of data for benchmarks - Refactor benchmarks to TypeVectorOperations - Check at the very least that auto vectorization is supported - Avoid VectorReinterpret::implemented - Refactor and add copyright header - Rephrase comment - Removed unnecessary assert methods - Adjust IR test after adding Move* vector support - ... and 12 more: https://git.openjdk.org/jdk/compare/88efdd03...e7e4d801 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26457/files - new: https://git.openjdk.org/jdk/pull/26457/files/01fd5ba0..e7e4d801 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26457&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26457&range=03-04 Stats: 58306 lines in 1513 files changed: 32401 ins; 19557 del; 6348 mod Patch: https://git.openjdk.org/jdk/pull/26457.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26457/head:pull/26457 PR: https://git.openjdk.org/jdk/pull/26457 From galder at openjdk.org Mon Aug 25 07:13:45 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 25 Aug 2025 07:13:45 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v4] In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: On Fri, 22 Aug 2025 11:40:10 GMT, Galder Zamarre?o wrote: >> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations. >> >> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows: >> >> >> Benchmark (seed) (size) Mode Cnt Base Patch Units Diff >> VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 1168.782 1157.717 ops/ms -1% >> VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 3999.387 7353.936 ops/ms +83% >> VectorBitConversion.floatToIntBits 0 2048 thrpt 8 1200.338 1188.206 ops/ms -1% >> VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 4058.248 14792.474 ops/ms +264% >> VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 3050.313 14984.246 ops/ms +391% >> VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 3022.691 7379.360 ops/ms +144% >> >> >> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control. >> >> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions. > > Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: > > - Add more IR node positive assertions > - Fix source of data for benchmarks > - Refactor benchmarks to TypeVectorOperations Merged and pushed latest master changes, all looks good still ------------- PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3219097709 From mhaessig at openjdk.org Mon Aug 25 07:20:45 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 25 Aug 2025 07:20:45 GMT Subject: RFR: 8365909: [REDO] Add a compilation timeout flag to catch long running compilations [v2] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 21:22:11 GMT, Christian Hagedorn wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation > > src/hotspot/os/linux/compilerThreadTimeout_linux.cpp line 105: > >> 103: #else >> 104: sev._sigev_un._tid = thread->osthread()->thread_id(); >> 105: #endif // MUSL_LIBC > > The `ifdef` should probably also be without indentation like the other `ifdefs`: > Suggestion: > > #ifdef MUSL_LIBC > sev.sigev_notify_thread_id = thread->osthread()->thread_id(); > #else > sev._sigev_un._tid = thread->osthread()->thread_id(); > #endif // MUSL_LIBC Good catch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26882#discussion_r2297309858 From mhaessig at openjdk.org Mon Aug 25 07:20:44 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 25 Aug 2025 07:20:44 GMT Subject: RFR: 8365909: [REDO] Add a compilation timeout flag to catch long running compilations [v2] In-Reply-To: References: Message-ID: > This PR adds a timeout for compilation tasks based on timer signals on Linux debug builds. > > This PR is a redo of #25872 with fixes for the failing test. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus internal testing on all Oracle supproted platforms > - [x] tier3,tier4 on linux-x64-debug > - [x] tier1,tier2,tier3,tier4 on linux-x64-debug with `-XX:CompileTaskTimeout=60000` Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Fix indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26882/files - new: https://git.openjdk.org/jdk/pull/26882/files/f86361c8..647f4933 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26882&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26882&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26882.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26882/head:pull/26882 PR: https://git.openjdk.org/jdk/pull/26882 From epeter at openjdk.org Mon Aug 25 08:37:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 08:37:08 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v22] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Fri, 22 Aug 2025 16:18:17 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add test for related report for JDK-8365982 > > This looks like "rabbit hole" :( > > May be file a separate RFE to investigate this behavior later by some other engineer. Most concerning is that reproduced on different platforms. > > I agree that we may accept this regression since it happened in corner case. I assume our benchmarks are not affected by this. Right? @vnkozlov Thanks for having a look! > I noticed In first (no_patch, fastest) assembler we don't have "strip mining" outer loop. While in other cases we have it. Do you know why? I've seen that too. I don't know why. The percentages on the "strip mining" outer loop are very low though (about 0.3% of the total block with 96.60%). Maybe it just does not get picked up in one of them? Still a little strange. The `patch` version has a "strip mined" outer loop for both the fast and slow path: Loop: N0/N0 has_sfpt Loop: N2375/N2377 counted [0,int),+1 (4 iters) pre multiversion_slow Loop: N361/N362 limit_check sfpts={ 364 } Loop: N4208/N359 limit_check counted [int,int),+64 (10966 iters) main multiversion_slow has_sfpt strip_mined Loop: N2240/N2242 limit_check counted [int,int),+1 (4 iters) post multiversion_slow Loop: N639/N651 counted [0,int),+1 (4 iters) pre multiversion_fast Loop: N202/N201 limit_check sfpts={ 204 } Loop: N3244/N178 limit_check counted [int,int),+512 (10966 iters) main vector multiversion_fast has_sfpt strip_mined Loop: N2591/N2594 limit_check counted [int,int),+64 (64 iters) post vector multiversion_fast Loop: N504/N516 limit_check counted [int,int),+1 (4 iters) post multiversion_fast And so does the `not_profitable` version: Loop: N0/N0 has_sfpt Loop: N489/N501 predicated counted [0,int),+1 (4 iters) pre Loop: N213/N212 limit_check sfpts={ 215 } Loop: N1879/N189 limit_check counted [int,int),+64 (10034 iters) main has_sfpt strip_mined Loop: N354/N366 limit_check counted [int,int),+1 (4 iters) post And in debug, `perfasm` also confirms that, it says that the inner main loop is strip mined, but still does not show the assembly for that: ;; B15: # out( B15 B16 ) <- in( B14 B15 ) Loop( B15-B15 inner main of N117 strip mined) Freq: 4.35414e+08 ? 0x00007efee0baad20: vmovd %xmm0,%r11d ? 0x00007efee0baad25: add %esi,%r11d 0.03% ? 0x00007efee0baad28: movslq %r11d,%r11 ? 0x00007efee0baad2b: vmovd %xmm3,%r10d 2.03% ? 0x00007efee0baad30: add %esi,%r10d ? 0x00007efee0baad33: movslq %r10d,%r8 0.13% ? 0x00007efee0baad36: movslq %esi,%r10 ? 0x00007efee0baad39: lea (%rax,%r10,1),%r9 1.20% ? 0x00007efee0baad3d: lea (%r10,%rbp,1),%rbx ? 0x00007efee0baad41: movsbl 0x10(%rdx,%rbx,1),%r10d 1.23% ? 0x00007efee0baad47: mov %r10b,0x10(%rcx,%r9,1) 0.73% ? 0x00007efee0baad4c: movsbl 0x11(%rdx,%rbx,1),%r10d 1.63% ? 0x00007efee0baad52: mov %r10b,0x11(%rcx,%r9,1) 0.17% ? 0x00007efee0baad57: movsbl 0x12(%rdx,%rbx,1),%r10d > Yes, it could be a lot of reasons we get such regression. Sadly, yes. Hard to chase them all. > Did you tried reduce unrolling of slow path. I can quickly try that for `patch`. It seems that this only makes things worse, the loop overhead gets worse the less we unroll. LoopMaxUnroll=64 -> 3341.382 ns/op (default) LoopMaxUnroll=32 -> 3456.612 ns/op LoopMaxUnroll=16 -> 3711.292 ns/op LoopMaxUnroll=8 -> 3883.523 >> Might it be the runtime check and related branch misprediction? > > Could be since you added outer loop in slow path. I don't think so. The strip-mined loop still happens inside the slow-path. We don't go back to the runtime check. Both the fast and slow path have a PreMainPost loop structure, where the main-loop is strip-mined. It was the simplest solution to just unswitch/multiversion at the single-iteration step, and otherwise keep the loop structures as before. We made that decision back in https://github.com/openjdk/jdk/pull/22016. While in some cases we can see the strip-mined loop in the `perfasm` assembly, we cannot see the runtime check at all. >> tma_backend_bound: 21.3 vs 24.8 - there seems to be a bottleneck in the backend for patch of 10% > > This seems indicate more time spent on data access. Did main-loop starts copying from the same offset/element in no_patch vs patch loops? More time spent on data access -> yes, that is what the number seems to claim. But I don't think there are more data accesses. Rather `not_profitable` just executes more efficiently, and `patch` executes fewer instructions per cycle. In the JMH, they execute for roughly the same time ~ #cycles. But the number of instructions is about 10% different, and so is the `tma_retiring`. > cycles: 18,641,133,534 vs 18,247,472,016 - similar number of cycles > instructions: 42,579,432,139 vs 38,553,272,686 - significant deviation in work per time (10%), but why? > tma_retiring: 42.4 vs 37.7 - clearly not_profitable executes code more efficiently Plus: I have a lot of correctness tests, that check that we access the right bytes. I have a lot of examples, and even fuzzing-style tests. > This looks like "rabbit hole" :( > > May be file a separate RFE to investigate this behavior later by some other engineer. Yes, it is quite a rabbit hole. Yes, at this point it could be good to get the benefits out of the door, and see if we can do something later about the edge-case regression. > Most concerning is that reproduced on different platforms. I was hoping that this is not the case. But yes, it reproduces on different platforms - though in slightly different ways and that is strange too. > I agree that we may accept this regression since it happened in corner case. Ok good. And if someone really has an issue with it, they can revert back to the old behavior with the product/diagnostic flag `UseAutoVectorizationSpeculativeAliasingChecks`. > I assume our benchmarks are not affected by this. Right? I ran it a while ago, but need to run it once more now after all the recent integrations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3219332076 From shade at openjdk.org Mon Aug 25 08:52:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 25 Aug 2025 08:52:54 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: On Fri, 15 Aug 2025 11:54:59 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments (Sorry, was on vacation). I am generally good with this patch. Address other reviewers feedback on test code, and we are good to go. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26589#pullrequestreview-3150486615 From bmaillard at openjdk.org Mon Aug 25 09:16:54 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 25 Aug 2025 09:16:54 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v6] In-Reply-To: References: Message-ID: On Sat, 23 Aug 2025 09:05:10 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Remove too strict assert from old code path Thanks for working on this change, I think this an important optimization opportunity that was previously missing. The code is very clear. I only have one nit. src/hotspot/share/opto/divnode.cpp line 543: > 541: NativeType i2_hi = i2->_hi == 0 ? -1 : i2->_hi; > 542: NativeType min_val = std::numeric_limits::min(); > 543: assert(min_val == min_jint || min_val == min_jlong, "min has to be either min_jint or min_jlong"); I find this assert a little confusing, as its outcome is completely independent from the inputs of the function. I would remove it ------------- Marked as reviewed by bmaillard (Author). PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3150548871 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2297542349 From mhaessig at openjdk.org Mon Aug 25 09:22:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 25 Aug 2025 09:22:56 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v6] In-Reply-To: References: Message-ID: On Mon, 25 Aug 2025 09:08:33 GMT, Beno?t Maillard wrote: >> Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove too strict assert from old code path > > src/hotspot/share/opto/divnode.cpp line 543: > >> 541: NativeType i2_hi = i2->_hi == 0 ? -1 : i2->_hi; >> 542: NativeType min_val = std::numeric_limits::min(); >> 543: assert(min_val == min_jint || min_val == min_jlong, "min has to be either min_jint or min_jlong"); > > I find this assert a little confusing, as its outcome is completely independent from the inputs of the function. I would remove it It depends on the template type. I would rather keep it to sanity check that the minimum value of `NativeType` is as we expect. If that does not hold, the optimization below is potentially wrong and has UB. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2297570098 From epeter at openjdk.org Mon Aug 25 10:41:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 10:41:13 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v22] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: <0VPvDeJsnSA6QQSzjHZUMCNkDptq-7VymhP2aURPPNw=.846d7910-8173-45c3-b461-c27fc48b41a9@github.com> On Fri, 22 Aug 2025 13:34:58 GMT, Emanuel Peter wrote: >> TODO work that arose during review process / recent merges with master: >> >> - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. >> - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. >> >> --------------- >> >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add test for related report for JDK-8365982 I think the results in https://github.com/openjdk/jdk/pull/24278#issuecomment-3213393035 already motivate the 2-staged approach: - First use predicate and only generate vectorized loop - If the predicate deopts, then use multiversioning I expect the real-world cases to look like this: - In most cases, we never have an aliasing case, the predicate never leads to deopt. We don't want to pay the extra compile time for multiversioning. - In a few cases, we will have occasional aliasing cases, and we have to pay the price of deopt/recompile with multiversioning. While recompilation is a price, it is more than worth it in the long-run, given we can get vectorized performance in most cases now. - In rare cases, we only have aliasing cases. We have to recompile, and could suffer from the regressions mentioned above. Speculative compilation always has a price, but that's ok if it affects only edge cases. Here some `CITime` numbers, with `-XX:RepeatCompilation=100`: - Never aliasing, aliasing runtime check never fails: - `patch` (only predicate): `3.454` on C2 (`2.427` in IdealLoop, `0.368` in AutoVectorize) - `no_predicate` (directly multiversion): `4.709` in C2 (`3.252` in IdealLoop, `0.425` in AutoVectorize) - With aliasing, runtime check fails: - `patch` (first predicate, then multiversioning): `5.956` on C2 (`4.198` in IdealLoop, `0.620` in AutoVectorize) - `no_predicate` (directly multiversion): `4.633` in C2 (`3.205` in IdealLoop, `0.418` in AutoVectorize) (I used the [example](https://github.com/openjdk/jdk/pull/24278#issuecomment-3210290629) and extended it with a non-aliasing case) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3219728235 From epeter at openjdk.org Mon Aug 25 10:55:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 10:55:52 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v23] In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: > TODO work that arose during review process / recent merges with master: > > - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. > - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. > > --------------- > > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSumm... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 217 commits: - Merge branch 'master' into JDK-8324751-Aliasing-Analysis-RTC - improve tests a little - add test for related report for JDK-8365982 - add test for related report for JDK-8360204 - add test for related report for JDK-8359688 - rm IR rule that checks multiversioning, rare cases fail due to RCE - disable flag if not possible - more documentation for Vladimir - improve benchmark - fix tests after master integration of JDK-8342692 and JDK-8356176 - ... and 207 more: https://git.openjdk.org/jdk/compare/45726a1f...a36e3f7a ------------- Changes: https://git.openjdk.org/jdk/pull/24278/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=22 Stats: 5828 lines in 29 files changed: 5579 ins; 18 del; 231 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Mon Aug 25 11:01:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 11:01:10 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v23] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Mon, 25 Aug 2025 10:55:52 GMT, Emanuel Peter wrote: >> TODO work that arose during review process / recent merges with master: >> >> - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing. >> - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE. >> >> --------------- >> >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 217 commits: > > - Merge branch 'master' into JDK-8324751-Aliasing-Analysis-RTC > - improve tests a little > - add test for related report for JDK-8365982 > - add test for related report for JDK-8360204 > - add test for related report for JDK-8359688 > - rm IR rule that checks multiversioning, rare cases fail due to RCE > - disable flag if not possible > - more documentation for Vladimir > - improve benchmark > - fix tests after master integration of JDK-8342692 and JDK-8356176 > - ... and 207 more: https://git.openjdk.org/jdk/compare/45726a1f...a36e3f7a I just merged with master again, and will run our internal performance testing again, just to be sure. It was all fine a few weeks ago, and I had even reported a performance improvement: image Let's hope I can reproduce that result ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3219796170 From epeter at openjdk.org Mon Aug 25 12:59:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 12:59:01 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v6] In-Reply-To: References: Message-ID: <586YaR01-YPK1-N4UvVo-WM2HfTGvZz8lz3leP42BTA=.d125635f-069b-43ba-a623-86e26e693bb2@github.com> On Sat, 23 Aug 2025 09:05:10 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Remove too strict assert from old code path Looks like a nice idea, thanks for the work! I started reading through a part of it and have a few questions / suggestions. --------------- A more general suggestion (could also be a future RFE): You could also use the `unsigned` bounds here. And you could also address `UDIvI/L`. Now that we have unsigned bounds we could use them: https://github.com/openjdk/jdk/pull/17508 You can read up in `type.hpp`, that an integer type has one or two simple intervals. 708 * 4. Either _lo == jint(_ulo) and _hi == jint(_uhi), or each element of a 709 * TypeInt lies in either interval [_lo, jint(_uhi)] or [jint(_ulo), _hi] 710 * (note that these intervals are disjoint in this case). I think with this, you could do an even more powerful optimization. ----------- Also: where ever I see these optimizations that work on ranges, and not just constants: we should do some more rigorous testing on those resulting ranges. I see that you already have some concrete examples. It would be good to extend those with a completely randomized version. See for inspiration: https://github.com/openjdk/jdk/pull/25254/files#diff-0e3d89ac8cf0548b69d9bdb0859380bc31de0a772fa7ff211f446a4a5abd4197R220-R248 src/hotspot/share/opto/divnode.cpp line 508: > 506: > 507: template > 508: static const IntegerType* compute_generic_div_type(const IntegerType* i1, const IntegerType* i2, int widen) { Do we need the `generic` in the name? The `template` already suggests that it can be used for different types, right? Also: I'm wondering if we can somehow extend this for `UDivI` and `UDIvL`. I suppose you would have to use the `_ulo` and `_uhi` instead of `_lo` and `_hi`. I'm not saying this all has to be done in this PR, but we could at least anticipate the extension to unsigned division. src/hotspot/share/opto/divnode.cpp line 532: > 530: // Case B: divisor range does NOT span zero. > 531: // Here i2 is entirely negative or entirely positive. > 532: // Let d_min and d_max be the nonzero endpoints of i2. Seems you define `d_min` and `d_max` here, but you don't use them anywhere. You should probably use names here that you will use further down. src/hotspot/share/opto/divnode.cpp line 533: > 531: // Here i2 is entirely negative or entirely positive. > 532: // Let d_min and d_max be the nonzero endpoints of i2. > 533: // Then a/b is monotonic in a and in b (when b keeps the same sign). I think you should talk about `i1` and `i2`. You have not defined `a` and `b` up to now. src/hotspot/share/opto/divnode.cpp line 547: > 545: // Special overflow case: min_val / (-1) == min_val (cf. JVMS?6.5 idiv/ldiv) > 546: // We need to be careful that we never run min_val / (-1) in C++ code, as this overflow is UB there > 547: // We also must include min_val in the output if i1->_lo == min_val and i2->_hi. `if i1->_lo == min_val and i2->_hi` I cannot parse this. The `if` suggests that there will be a condition following. The `and` confirms that. Then I see `i1->_lo == min_val` which is a boolean condition. But `i2->_hi` is not. Ah, did you mean this the condition from below? Suggestion: // We also must include min_val in the output if i1->_lo == min_val and i2->_hi == -1. src/hotspot/share/opto/divnode.cpp line 552: > 550: NativeType new_lo = min_val; > 551: NativeType new_hi; > 552: // compute new_hi for non-constant divisor and/or dividend. You suggest we only land here in non-constant cases. Is that true? What if `i1=min_val` and `i2=-1`? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3151213349 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2297964702 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2297970256 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2297971528 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2297986773 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2297995321 From epeter at openjdk.org Mon Aug 25 12:59:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 12:59:02 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v6] In-Reply-To: References: Message-ID: On Mon, 25 Aug 2025 09:19:24 GMT, Manuel H?ssig wrote: >> src/hotspot/share/opto/divnode.cpp line 543: >> >>> 541: NativeType i2_hi = i2->_hi == 0 ? -1 : i2->_hi; >>> 542: NativeType min_val = std::numeric_limits::min(); >>> 543: assert(min_val == min_jint || min_val == min_jlong, "min has to be either min_jint or min_jlong"); >> >> I find this assert a little confusing, as its outcome is completely independent from the inputs of the function. I would remove it > > It depends on the template type. I would rather keep it to sanity check that the minimum value of `NativeType` is as we expect. If that does not hold, the optimization below is potentially wrong and has UB. We will one day want to use use smaller integer types here, just for some exhaustive gtesting. But I suppose we can remove it at that point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2297978419 From epeter at openjdk.org Mon Aug 25 13:23:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 13:23:02 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v6] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: On Mon, 25 Aug 2025 13:09:06 GMT, Emanuel Peter wrote: >> Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: >> >> - typos >> - Merge branch 'master' into improve-mod-value >> - Merge branch 'master' into improve-mod-value >> - simplify UB/cpu exception check >> - wording >> - Address more comments >> - Merge branch 'master' into improve-mod-value >> - Add randomized test >> - Use BasicType for shared implementation >> - Update ModL comment >> - ... and 8 more: https://git.openjdk.org/jdk/compare/9e98b6eb...11210414 > > src/hotspot/share/opto/divnode.cpp line 1207: > >> 1205: const Type* t2 = phase->type(in2); >> 1206: if (t1 == Type::TOP) return Type::TOP; >> 1207: if (t2 == Type::TOP) return Type::TOP; > > Suggestion: > > if (t1 == Type::TOP) { return Type::TOP; } > if (t2 == Type::TOP) { return Type::TOP; } > > If we already touch the code, we should also fix the brackets. Please fix it below as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2298064781 From epeter at openjdk.org Mon Aug 25 13:23:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 13:23:01 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v6] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: On Mon, 11 Aug 2025 07:17:40 GMT, Hannes Greule wrote: >> This change improves the precision of the `Mod(I|L)Node::Value()` functions. >> >> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early. >> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions. >> >> ### Monotonicity >> >> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). >> >> ### Testing >> >> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something). >> >> Please review and let me know what you think. >> >> ### Other >> >> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508. >> >> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into: >> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? >> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd. > > Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: > > - typos > - Merge branch 'master' into improve-mod-value > - Merge branch 'master' into improve-mod-value > - simplify UB/cpu exception check > - wording > - Address more comments > - Merge branch 'master' into improve-mod-value > - Add randomized test > - Use BasicType for shared implementation > - Update ModL comment > - ... and 8 more: https://git.openjdk.org/jdk/compare/9e98b6eb...11210414 Sorry, I was away on summer vacation and other travel. Back to reviewing now ;) Looks really good now. I think we can almost integrate now. One thing I'm wondering: could this be extended to `UModI/L`? That can of course be a separate RFE as well. And yet another idea: could we use the known bits? See https://github.com/openjdk/jdk/pull/17508. src/hotspot/share/opto/divnode.cpp line 1207: > 1205: const Type* t2 = phase->type(in2); > 1206: if (t1 == Type::TOP) return Type::TOP; > 1207: if (t2 == Type::TOP) return Type::TOP; Suggestion: if (t1 == Type::TOP) { return Type::TOP; } if (t2 == Type::TOP) { return Type::TOP; } If we already touch the code, we should also fix the brackets. ------------- PR Review: https://git.openjdk.org/jdk/pull/25254#pullrequestreview-3151393840 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2298064321 From epeter at openjdk.org Mon Aug 25 13:23:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 13:23:02 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v2] In-Reply-To: <3BJWLK3FukQCp2FHGcyBDTZtbc5aS8VreNKYKAaQrdU=.43a7e821-8d56-4161-850a-9137d17d44de@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> <3BJWLK3FukQCp2FHGcyBDTZtbc5aS8VreNKYKAaQrdU=.43a7e821-8d56-4161-850a-9137d17d44de@github.com> Message-ID: On Mon, 16 Jun 2025 06:57:16 GMT, Emanuel Peter wrote: >> @SirYwell Thanks for looking into this, that looks promising! >> >> I have two bigger comments: >> - Could we unify the L and I code, either using C++ templating or `BasicType`? It would reduce code duplication. >> - Can we have some tests where the input ranges are random as well, and where we check the output ranges with some comparisons? >> >> ------------------ >> Copied from the code comment: >> >>> Nice work with the examples you already have, and randomizing some of it! >>> >>> I would like to see one more generalized test. >>> - compute `res = lhs % rhs` >>> - Truncate both `lhs` and `rhs` with randomly produced bounds from Generators, like this: `lhs = Math.max(lo, Math.min(hi, lhs))`. >>> - Below, add all sorts of comparisons with random constants, like this: `if (res < CON) { sum += 1; }`. If the output range is wrong, this could wrongly constant fold, and allow us to catch that. >>> >>> Then fuzz the generated method a few times with random inputs for `lhs` and `rhs`, and check that the `sum` and `res` value are the same for compiled and interpreted code. >>> >>> I hope that makes sense :) >>> This is currently my best method to check if ranges are correct, and I think it is quite important because often tests are only written with constants in mind, but less so with ranges, and then we mess up the ranges because it is just too tricky. >>> >>> This is an example, where I asked someone to try this out as well: >>> https://github.com/openjdk/jdk/pull/23089/files#diff-12bebea175a260a6ab62c22a3681ccae0c3d9027900d2fdbd8c5e856ae7d1123R404-R422 > >> @eme64 I merged master and hopefully addressed your latest comments. Now that we have #17508 integrated, I could also directly update the unsigned variant, but I'm also fine with doing that separately. WDYT? >> >> I also checked the constant folding part again (or generally whenever the RHS is a constant), these code paths are indeed not used by PhaseGVN directly (but by PhaseCCP and PhaseIdealLoop). That makes it a bit difficult to test that part properly. > > Let's keep the patch as it is. With #17508 we will have to also probably refactor and add more tests, if we want to do any unsigned and known-bit optimizations. > > ---------------- > > @SirYwell Thanks for the updates, I had a few more comments, but we are getting there :) > @eme64 I addressed your latest comments now, please re-review :) > > Regarding my previous observation > > > * If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? > > should I open a new RFE for that? Or generally, what's your opinion on this? Can you show some examples? Filing an RFE would surely not be wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-3220245882 From bulasevich at openjdk.org Mon Aug 25 14:17:14 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 25 Aug 2025 14:17:14 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3] In-Reply-To: References: Message-ID: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> > This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. > > The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. > > This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. > > The problems is that shift count `n` may be too large here: > > class Pipeline_Use_Cycle_Mask { > protected: > uint _mask; > .. > Pipeline_Use_Cycle_Mask& operator<<=(int n) { > _mask <<= n; > return *this; > } > }; > > The recent change attempted to cap the shift amount at one call site: > > class Pipeline_Use_Element { > protected: > .. > // Mask of specific used cycles > Pipeline_Use_Cycle_Mask _mask; > .. > void step(uint cycles) { > _used = 0; > uint max_shift = 8 * sizeof(_mask) - 1; > _mask <<= (cycles < max_shift) ? cycles : max_shift; > } > } > > However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: > > // The following two routines assume that the root Pipeline_Use entity > // consists of exactly 1 element for each functional unit > // start is relative to the current cycle; used for latency-based info > uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { > for (uint i = 0; i < pred._count; i++) { > const Pipeline_Use_Element *predUse = pred.element(i); > if (predUse->_multiple) { > uint min_delay = 7; > // Multiple possible functional units, choose first unused one > for (uint j = predUse->_lb; j <= predUse->_ub; j++) { > const Pipeline_Use_Element *currUse = element(j); > uint curr_delay = delay; > if (predUse->_used & currUse->_used) { > Pipeline_Use_Cycle_Mask x = predUse->_mask; > Pipeline_Use_Cycle_Mask y = currUse->_mask; > > for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) > y <<= 1; > } > if (min_delay > curr_delay) > min_delay = curr_delay; > } > if (delay < min_delay) > delay = min_delay; > } > else { > for (uint j = predUse->_lb; j <= predUse->_ub; j++) { > const Pipeline_Use_Element *currUse = element(j); > if (predUse->_used & currUse->_used) { > ... Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: use uint32_t for _mask ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26890/files - new: https://git.openjdk.org/jdk/pull/26890/files/389a9dab..e3ac8703 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26890&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26890&range=01-02 Stats: 11 lines in 1 file changed: 0 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/26890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26890/head:pull/26890 PR: https://git.openjdk.org/jdk/pull/26890 From bulasevich at openjdk.org Mon Aug 25 14:17:14 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 25 Aug 2025 14:17:14 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 22:43:46 GMT, Vladimir Kozlov wrote: >> Yes, if `sizeof(uint)` is 32 bits on all our platforms. >> >> Hmm, may be we should use `uint32_t` for `_mask` here. Then we can use 32 and 31 without confusion. > > I mean to use `_mask <<= (n < 32) ? n : 31;` Good! Let me correct both variants then. The resulting code is: class Pipeline_Use_Cycle_Mask { protected: uint32_t _mask; public: Pipeline_Use_Cycle_Mask() : _mask(0) {} Pipeline_Use_Cycle_Mask(uint32_t mask) : _mask(mask) {} bool overlaps(const Pipeline_Use_Cycle_Mask &in2) const { return ((_mask & in2._mask) != 0); } Pipeline_Use_Cycle_Mask& operator<<=(int n) { _mask <<= (n < 32) ? n : 31; return *this; } void Or(const Pipeline_Use_Cycle_Mask &in2) { _mask |= in2._mask; } friend Pipeline_Use_Cycle_Mask operator&(const Pipeline_Use_Cycle_Mask &, const Pipeline_Use_Cycle_Mask &); friend Pipeline_Use_Cycle_Mask operator|(const Pipeline_Use_Cycle_Mask &, const Pipeline_Use_Cycle_Mask &); friend class Pipeline_Use; friend class Pipeline_Use_Element; }; // code generated for arm32: class Pipeline_Use_Cycle_Mask { protected: uint32_t _mask1, _mask2, _mask3; public: Pipeline_Use_Cycle_Mask() : _mask1(0), _mask2(0), _mask3(0) {} Pipeline_Use_Cycle_Mask(uint32_t mask1, uint32_t mask2, uint32_t mask3) : _mask1(mask1), _mask2(mask2), _mask3(mask3) {} Pipeline_Use_Cycle_Mask intersect(const Pipeline_Use_Cycle_Mask &in2) { Pipeline_Use_Cycle_Mask out; out._mask1 = _mask1 & in2._mask1; out._mask2 = _mask2 & in2._mask2; out._mask3 = _mask3 & in2._mask3; return out; } bool overlaps(const Pipeline_Use_Cycle_Mask &in2) const { return ((_mask1 & in2._mask1) != 0) || ((_mask2 & in2._mask2) != 0) || ((_mask3 & in2._mask3) != 0); } Pipeline_Use_Cycle_Mask& operator<<=(int n) { if (n >= 32) do { _mask3 = _mask2; _mask2 = _mask1; _mask1 = 0; } while ((n -= 32) >= 32); if (n > 0) { uint m = 32 - n; uint32_t mask = (1 << n) - 1; uint32_t temp2 = mask & (_mask1 >> m); _mask1 <<= n; uint32_t temp3 = mask & (_mask2 >> m); _mask2 <<= n; _mask2 |= temp2; _mask3 <<= n; _mask3 |= temp3; } return *this; } void Or(const Pipeline_Use_Cycle_Mask &); friend Pipeline_Use_Cycle_Mask operator&(const Pipeline_Use_Cycle_Mask &, const Pipeline_Use_Cycle_Mask &); friend Pipeline_Use_Cycle_Mask operator|(const Pipeline_Use_Cycle_Mask &, const Pipeline_Use_Cycle_Mask &); friend class Pipeline_Use; friend class Pipeline_Use_Element; }; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26890#discussion_r2298235985 From roland at openjdk.org Mon Aug 25 14:20:03 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 25 Aug 2025 14:20:03 GMT Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit, limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v4] In-Reply-To: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com> References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com> Message-ID: <6dWR-SxhuKd9-T3q313I6at4vTBcYlufyCBNjGGopv4=.cae3abea-0752-4191-ac08-890476489af3@github.com> > A node in a pre loop only has uses out of the loop dominated by the > loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control > to the loop exit projection. A range check in the main loop has this > node as input (through a chain of some other nodes). Range check > elimination needs to update the exit condition of the pre loop with an > expression that depends on the node pinned on its exit: that's > impossible and the assert fires. This is a variant of 8314024 (this > one was for a node with uses out of the pre loop on multiple paths). I > propose the same fix: leave the node with control in the pre loop in > this case. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into JDK-8361702 - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java Co-authored-by: Christian Hagedorn - tests - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26424/files - new: https://git.openjdk.org/jdk/pull/26424/files/1b658c4b..cc64aa6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26424&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26424&range=02-03 Stats: 59316 lines in 1541 files changed: 32800 ins; 20025 del; 6491 mod Patch: https://git.openjdk.org/jdk/pull/26424.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26424/head:pull/26424 PR: https://git.openjdk.org/jdk/pull/26424 From jkarthikeyan at openjdk.org Mon Aug 25 14:32:52 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 25 Aug 2025 14:32:52 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v6] In-Reply-To: References: Message-ID: <21ZUsLbWEL8wO4hd-Yn7TE47qGFzo3622rXDVoQ0i2Q=.830977fb-3ebb-479a-8f3d-fff57011f4b8@github.com> On Mon, 25 Aug 2025 12:32:30 GMT, Emanuel Peter wrote: >> It depends on the template type. I would rather keep it to sanity check that the minimum value of `NativeType` is as we expect. If that does not hold, the optimization below is potentially wrong and has UB. > > We will one day want to use use smaller integer types here, just for some exhaustive gtesting. But I suppose we can remove it at that point. Could this be made a `static_assert` instead? That way we can test the condition, but we wouldn't need to pay a runtime cost. The C++ compiler might already do such an optimization, but it'd be good to be more explicit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2298279235 From duke at openjdk.org Mon Aug 25 14:32:54 2025 From: duke at openjdk.org (Johannes Graham) Date: Mon, 25 Aug 2025 14:32:54 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v6] In-Reply-To: References: Message-ID: On Sat, 23 Aug 2025 09:05:10 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Remove too strict assert from old code path test/hotspot/jtreg/compiler/c2/irTests/IntegerDivValueTests.java line 49: > 47: public int testIntConstantFolding() { > 48: // All constants available during parsing > 49: return 50 / 25; This will be constant-folded by javac, so won't exercise c2 test/hotspot/jtreg/compiler/c2/irTests/IntegerDivValueTests.java line 56: > 54: public int testIntConstantFoldingSpecialCase() { > 55: // All constants available during parsing > 56: return Integer.MIN_VALUE / -1; This will also be folded by javac ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2298271180 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2298272480 From epeter at openjdk.org Mon Aug 25 15:00:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 15:00:08 GMT Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph [v3] In-Reply-To: References: Message-ID: <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com> On Thu, 14 Aug 2025 10:43:08 GMT, Marc Chevalier wrote: >> Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash. >> >> Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN. >> >> This feature is enabled with the develop flag `VerifyIdealStructuralInvariants`. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as `Node::dump` or `Node::Name`. >> >> For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a `If` node, we have a `IfTrue` and a `IfFalse`. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds. >> >> On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs: >> >> 1 failure for node >> 211 OuterStripMinedLoopEnd === 215 39 [[ 212 198 ]] P=0,948966, C=23799,000000 >> At node >> 209 CountedLoopEnd === 182 208 [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100) >> From path: >> [center] 211 OuterStripMinedLoopEnd === 215 39 [[ 212 198 ]] P=0,948966, C=23799,000000 >> <-(0)- 215 SafePoint === 210 1 7 1 1 216 37 54 185 [[ 211 ]] SafePoint !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100) >> <-(0)- 210 IfFalse === 209 [[ 21... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Beno?t's comments Wow, very nice work @marc-chevalier ! Cool that you tried a pattern-matching approach. I really do wonder if we could use that more widely ? src/hotspot/share/opto/graphInvariants.cpp line 32: > 30: > 31: void LocalGraphInvariant::LazyReachableCFGNodes::fill() { > 32: precond(live_nodes.size() == 0); Maybe I missed something here: where do the `precond` and `postcond` come from? src/hotspot/share/opto/graphInvariants.cpp line 57: > 55: } > 56: > 57: void print_path(const Node_List& steps, const GrowableArray& path, stringStream& ss) { Totally optional: you could make this a private static method of `GraphInvariantChecker`. Just to group it to where it belongs. src/hotspot/share/opto/graphInvariants.cpp line 89: > 87: } > 88: > 89: struct Pattern : ResourceObj { Some comments at the classes could be nice. Especially with `Bind` I did not have any idea what it could be used for. src/hotspot/share/opto/graphInvariants.cpp line 105: > 103: return true; > 104: } > 105: const Node*& _binding; Does this need to be public? src/hotspot/share/opto/graphInvariants.cpp line 133: > 131: return true; > 132: } > 133: GrowableArray _checks; Does this need to be public? src/hotspot/share/opto/graphInvariants.cpp line 176: > 174: } > 175: const uint _expect_req; > 176: }; Looks like code duplication. Could you make it more general with a callback? You could still create alias structs that already have the correct callback/lambdas. src/hotspot/share/opto/graphInvariants.cpp line 207: > 205: } > 206: bool (Node::*_type_check)() const; > 207: }; You could probably generalize this with a callback approach. And then one concrete implentation is the one that does the type check. Just an idea. src/hotspot/share/opto/graphInvariants.cpp line 270: > 268: new HasNOutputs(2), > 269: new AtSingleOutputOfType(&Node::is_IfTrue, new True()), > 270: new AtSingleOutputOfType(&Node::is_IfFalse, new True()))) { I would suggest that you append the word `Pattern` to all `Patterns` - at least in most cases this will make it a bit easier to see what you have at the use-site. I'm looking at `new True()` and wonder what might be passed here... if it was called `TruePattern`, it would be immediately clear. src/hotspot/share/opto/graphInvariants.cpp line 279: > 277: return CheckResult::NOT_APPLICABLE; > 278: } > 279: CheckResult r = PatternBasedCheck::check(center, reachable_cfg_nodes, steps, path, ss); Could this not be solved with a `OrPattern`? Or::make( ) Not sure that's worth it... src/hotspot/share/opto/graphInvariants.cpp line 287: > 285: } > 286: } > 287: return r; Also this could probably be handled with a pattern wrapping mechanism, right? `FailOnlyForLiveNodes( )` src/hotspot/share/opto/graphInvariants.cpp line 301: > 299: And::make( > 300: new NodeClass(&Node::is_Region), > 301: new Bind(region_node))))) { This sort of binding is kinda cool! Never thought of it before. Could be really cool for general pattern matching. We would have to find a solution if there would be multiple bindings though ... I think that's not possible with your patterns, right? Is that a fundamental constraint? src/hotspot/share/opto/graphInvariants.cpp line 309: > 307: if (!center->is_Phi()) { > 308: return CheckResult::NOT_APPLICABLE; > 309: } Could do this via `Or`? src/hotspot/share/opto/graphInvariants.cpp line 319: > 317: return CheckResult::FAILED; > 318: } > 319: return CheckResult::VALID; Another funky idea: could probably be handled with some callback, some "terminal" check you do on the bound variable. Not sure if worth it. src/hotspot/share/opto/graphInvariants.cpp line 323: > 321: }; > 322: > 323: struct ControlSuccessor : LocalGraphInvariant { A quick comment above would prevent me from having to reverse-engineer the code below ;) src/hotspot/share/opto/graphInvariants.cpp line 332: > 330: } > 331: > 332: Node_List ctrl_succ; Do we need a `ResouceMark` for this? src/hotspot/share/opto/graphInvariants.cpp line 338: > 336: if (out->is_CFG()) { > 337: cfg_out++; > 338: ctrl_succ.push(out); Seems you do these in a pair. So why do you need `cfg_out` at all? Can you not take the length/size of `ctrl_succ`? After all, it counts duplicates too (hope that is intended). src/hotspot/share/opto/graphInvariants.cpp line 399: > 397: } > 398: CheckResult check(const Node* center, LazyReachableCFGNodes& reachable_cfg_nodes, Node_List& steps, GrowableArray& path, stringStream& ss) const override { > 399: if (!center->is_Region() && !center->is_Start() && !center->is_Root()) { If you allow non-Regions to play here, then I'd just call it `SelfLoopPattern`. src/hotspot/share/opto/graphInvariants.cpp line 413: > 411: ss.print_cr("%s nodes' 0-th input must be itself or nullptr (for a copy Region).", center->Name()); > 412: return CheckResult::FAILED; > 413: } Absolutely subjective: checking `self != center` is more about `self`, checking `center != self` is more about `center`. So I would use `self != center` :rofl: Suggestion: if (self != center || (center->is_Region() && self == nullptr)) { ss.print_cr("%s nodes' 0-th input must be itself or nullptr (for a copy Region).", center->Name()); return CheckResult::FAILED; } src/hotspot/share/opto/graphInvariants.cpp line 417: > 415: if (self == nullptr) { > 416: // Must be a copy Region > 417: Node_List non_null_inputs; ResouceMark? src/hotspot/share/opto/graphInvariants.cpp line 447: > 445: And::make( > 446: new NodeClass(&Node::is_IfTrue), > 447: new HasAtLeastNInputs(1), Can an `IfTrue` have more than 1 input? src/hotspot/share/opto/graphInvariants.cpp line 452: > 450: And::make( > 451: new NodeClass(&Node::is_BaseCountedLoopEnd), > 452: new Bind(counted_loop))))))) {} Ah, another check and Bind! Why not allow `Bind`, so we can bind it with the cast? src/hotspot/share/opto/graphInvariants.cpp line 468: > 466: } > 467: assert(counted_loop != nullptr, "sanity"); > 468: if (is_long) { Why did you cache the value? Seems `is_long` is only used once ... and `center` should not change pointers around. src/hotspot/share/opto/graphInvariants.cpp line 469: > 467: assert(counted_loop != nullptr, "sanity"); > 468: if (is_long) { > 469: if (counted_loop->is_CountedLoopEnd()) { Sounds like head/tail confusion here. Call it `counted_loop_end`. src/hotspot/share/opto/graphInvariants.cpp line 562: > 560: } > 561: > 562: VectorSet enqueued; I would move the ResourceMark to the beginning of the allocations, and do the fast bail-out first. src/hotspot/share/opto/graphInvariants.cpp line 575: > 573: // For CFG-related errors, we will compute the set of reachable CFG nodes and decide whether to keep > 574: // the issue if the problematic node is reachable. This set of reachable node is thus computed lazily > 575: // (and it seems not to happen often in practice), and shared across checks. Suggestion: // Sometimes, we get weird structures in dead code that will be cleaned up later. It typically happens // when data dies, but control is not cleaned up right away, possibly kept alive by an unreachable loop. // Since we don't want to eagerly traverse the whole graph to remove dead code in IGVN, we can accept // weird structures in dead code. // For CFG-related errors, we will compute the set of reachable CFG nodes and decide whether to keep // the issue if the problematic node is reachable. This set of reachable nodes is thus computed lazily // (and it seems not to happen often in practice), and shared across checks. src/hotspot/share/opto/graphInvariants.cpp line 585: > 583: if (in != nullptr && !enqueued.test_set(in->_idx)) { > 584: worklist.push(in); > 585: } Why not make a `Unique_Node_List`? It would already have a `VectorSet` included, and you could just push without checking if we already pushed the node. Very nice for BFS traversals. You would then not even pop nodes, but just traverse over the worklist, as it grows. src/hotspot/share/opto/graphInvariants.cpp line 611: > 609: center->dump(); > 610: tty->print_cr("%s", ss.base()); > 611: ss.reset(); Do you really want to use the `ttyLocker` here? I thought we were trying to get away from it because it sometimes leads to lock-priority issues / dead-locks. Why not just use yet another `stringStream ss3`, and do it all via that one? src/hotspot/share/opto/graphInvariants.hpp line 35: > 33: class LocalGraphInvariant : public ResourceObj { > 34: public: > 35: static constexpr int OutputStep = -1; Can you please add a quick comment what this is for? After all, it is a public static constant ;) src/hotspot/share/opto/graphInvariants.hpp line 37: > 35: static constexpr int OutputStep = -1; > 36: > 37: struct LazyReachableCFGNodes { You could add a comment here. What I was surprised by: that you do a whole graph traversal the first time we call `is_node_dead`. I thought you would just visit a subgraph every time, and fill out the `live_nodes` gradually. You could also give an explanation why it needs to be lazy. Is it possible that we never call `is_node_dead`? src/hotspot/share/opto/graphInvariants.hpp line 41: > 39: private: > 40: void fill(); > 41: Unique_Node_List live_nodes; I think the hotspot convention is to have fields with an underscore `_live_nodes`. Especially if they are private. src/hotspot/share/opto/graphInvariants.hpp line 56: > 54: * > 55: * If the check fails steps and path must be filled with the path from the center to the failing node (where it's relevant to show). > 56: * Given a list of node Suggestion: * Given a list of nodes src/hotspot/share/opto/graphInvariants.hpp line 64: > 62: * - a non-negative integer p for each step such that N{i-1} has Ni as p-th input (we need to follow an input edge) > 63: * - the OUTPUT_STEP value in case N{i-1} has Ni as an output (we need to follow an output edge) > 64: * The list are reversed to allow to easily fill them lazily on failure. Suggestion: * The lists are reversed to allow to easily fill them lazily on failure. src/hotspot/share/opto/graphInvariants.hpp line 70: > 68: * > 69: * The parameter [live_nodes] is used to share the lazily computed set of CFG nodes reachable from root. This is because some > 70: * checks don't apply to dead code, suppress their error if a violation is detected in dead code. Does that mean we only cache the result if it is reachable, but not if it is not reachable? Does that mean we may check reachability for non-reachable nodes many many times? src/hotspot/share/opto/phaseX.hpp line 615: > 613: Node* _verify_window[_verify_window_size]; > 614: void verify_step(Node* n); > 615: GraphInvariantChecker* _invariant_checker; Why do you allocate it separately, and not have it in-place? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26362#pullrequestreview-3151474721 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298159885 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298174539 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298187255 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298212967 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298191417 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298197676 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298203251 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298212389 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298236694 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298239978 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298244614 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298253028 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298257576 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298264410 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298262968 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298271376 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298277114 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298286692 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298289038 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298302608 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298298634 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298309408 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298306579 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298319813 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298324046 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298329951 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298336112 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298115554 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298169221 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298117370 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298122016 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298123512 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298129493 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298149486 From epeter at openjdk.org Mon Aug 25 15:00:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 15:00:09 GMT Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph In-Reply-To: References: Message-ID: <4Hn-yZL83w43zpV5WQ-o8XOvqqP5bmpD9RjVJHZMZWw=.3232d367-6d3e-4587-be8f-99465f4dba95@github.com> On Tue, 29 Jul 2025 16:59:20 GMT, Vladimir Kozlov wrote: > I am fine with `VerifyIdealGraph` flag. The main concern is we have tons of `Verify*` flags but I don't think we use them in CI testing. So we are forgetting about them, they will brake and few years later we are removing them like we did with `VerifyOpto`. Yes. What you need is at least a "Hello World" test, where the flag is enabled. And then we should try to add it to stress and fuzzer tests, so file an RFE for that! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26362#issuecomment-3220631845 From epeter at openjdk.org Mon Aug 25 15:00:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 15:00:09 GMT Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph [v3] In-Reply-To: <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com> References: <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com> Message-ID: On Mon, 25 Aug 2025 14:07:02 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Beno?t's comments > > src/hotspot/share/opto/graphInvariants.cpp line 270: > >> 268: new HasNOutputs(2), >> 269: new AtSingleOutputOfType(&Node::is_IfTrue, new True()), >> 270: new AtSingleOutputOfType(&Node::is_IfFalse, new True()))) { > > I would suggest that you append the word `Pattern` to all `Patterns` - at least in most cases this will make it a bit easier to see what you have at the use-site. I'm looking at `new True()` and wonder what might be passed here... if it was called `TruePattern`, it would be immediately clear. You could leave a comment at `True(Pattern)` that it is (often) used as the terminal pattern, at the end of a branch / search. > src/hotspot/share/opto/graphInvariants.cpp line 287: > >> 285: } >> 286: } >> 287: return r; > > Also this could probably be handled with a pattern wrapping mechanism, right? > `FailOnlyForLiveNodes( )` I'm just suggesting it in case you need to do this sort of special-casing elsewhere too ;) > src/hotspot/share/opto/graphInvariants.cpp line 301: > >> 299: And::make( >> 300: new NodeClass(&Node::is_Region), >> 301: new Bind(region_node))))) { > > This sort of binding is kinda cool! Never thought of it before. Could be really cool for general pattern matching. > We would have to find a solution if there would be multiple bindings though ... I think that's not possible with your patterns, right? Is that a fundamental constraint? What would be extra cool / funky: If we could somehow already cast the `Bind` variable to `Region`. Could be tricky. Doing this `is_Region and bind` could be a very common idiom, so very useful. > src/hotspot/share/opto/graphInvariants.cpp line 417: > >> 415: if (self == nullptr) { >> 416: // Must be a copy Region >> 417: Node_List non_null_inputs; > > ResouceMark? Is it worth it to do the allocation, if in most cases we just expect 1 non-null? Why not count non-nulls, and if we find more than one, traverse again over the Region, and filter and dump them? > src/hotspot/share/opto/graphInvariants.cpp line 452: > >> 450: And::make( >> 451: new NodeClass(&Node::is_BaseCountedLoopEnd), >> 452: new Bind(counted_loop))))))) {} > > Ah, another check and Bind! Why not allow `Bind`, so we can bind it with the cast? And I would call it `counted_loop_end`. > src/hotspot/share/opto/graphInvariants.cpp line 469: > >> 467: assert(counted_loop != nullptr, "sanity"); >> 468: if (is_long) { >> 469: if (counted_loop->is_CountedLoopEnd()) { > > Sounds like head/tail confusion here. Call it `counted_loop_end`. Also: I would invert the check to `!counted_loop_end->is_LongCountedLoopEnd()`. Because you expect it to be a long end here. Subjective. > src/hotspot/share/opto/phaseX.hpp line 615: > >> 613: Node* _verify_window[_verify_window_size]; >> 614: void verify_step(Node* n); >> 615: GraphInvariantChecker* _invariant_checker; > > Why do you allocate it separately, and not have it in-place? Is there only a single PhaseIterGVN per compilation? I forgot. An alternative would be to allocate it at the level of the compilation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298220598 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298241305 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298251176 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298295335 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298305594 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298312279 PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298153277 From epeter at openjdk.org Mon Aug 25 15:05:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 15:05:54 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 15:21:48 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Update comment for constraint casts Testing passed :green_circle: ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26827#pullrequestreview-3151900408 From epeter at openjdk.org Mon Aug 25 15:11:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 15:11:54 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v4] In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: On Mon, 25 Aug 2025 07:10:26 GMT, Galder Zamarre?o wrote: >> Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: >> >> - Add more IR node positive assertions >> - Fix source of data for benchmarks >> - Refactor benchmarks to TypeVectorOperations > > Merged and pushed latest master changes, all looks good still @galderz Looks great! I'm going to run some sanity-testing on our internal infrastructure... ------------- PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3220673724 From epeter at openjdk.org Mon Aug 25 15:16:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 15:16:57 GMT Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit, limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v4] In-Reply-To: <6dWR-SxhuKd9-T3q313I6at4vTBcYlufyCBNjGGopv4=.cae3abea-0752-4191-ac08-890476489af3@github.com> References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com> <6dWR-SxhuKd9-T3q313I6at4vTBcYlufyCBNjGGopv4=.cae3abea-0752-4191-ac08-890476489af3@github.com> Message-ID: On Mon, 25 Aug 2025 14:20:03 GMT, Roland Westrelin wrote: >> A node in a pre loop only has uses out of the loop dominated by the >> loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control >> to the loop exit projection. A range check in the main loop has this >> node as input (through a chain of some other nodes). Range check >> elimination needs to update the exit condition of the pre loop with an >> expression that depends on the node pinned on its exit: that's >> impossible and the assert fires. This is a variant of 8314024 (this >> one was for a node with uses out of the pre loop on multiple paths). I >> propose the same fix: leave the node with control in the pre loop in >> this case. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into JDK-8361702 > - Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java > > Co-authored-by: Christian Hagedorn > - tests > - fix test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java line 28: > 26: * @bug 8361702 > 27: * @summary C2: assert(is_dominator(compute_early_ctrl(limit, limit_ctrl), pre_end)) failed: node pinned on loop exit test? > 28: * @requires vm.flavor == "server" Would this test fail without this requires? Or could we remove it, in the hopes of catching something else somewhere else? test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java line 30: > 28: * @requires vm.flavor == "server" > 29: * > 30: * @run main/othervm -XX:-BackgroundCompilation -XX:LoopUnrollLimit=100 -XX:-UseLoopPredicate -XX:-UseProfiledLoopPredicate TestSunkRangeFromPreLoopRCE3 Could we have a run without any flags / fewer flags? Just in case it catches something else / related. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26424#discussion_r2298393419 PR Review Comment: https://git.openjdk.org/jdk/pull/26424#discussion_r2298394619 From kvn at openjdk.org Mon Aug 25 15:19:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 25 Aug 2025 15:19:54 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3] In-Reply-To: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> References: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> Message-ID: On Mon, 25 Aug 2025 14:17:14 GMT, Boris Ulasevich wrote: >> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. >> >> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. >> >> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. >> >> The problems is that shift count `n` may be too large here: >> >> class Pipeline_Use_Cycle_Mask { >> protected: >> uint _mask; >> .. >> Pipeline_Use_Cycle_Mask& operator<<=(int n) { >> _mask <<= n; >> return *this; >> } >> }; >> >> The recent change attempted to cap the shift amount at one call site: >> >> class Pipeline_Use_Element { >> protected: >> .. >> // Mask of specific used cycles >> Pipeline_Use_Cycle_Mask _mask; >> .. >> void step(uint cycles) { >> _used = 0; >> uint max_shift = 8 * sizeof(_mask) - 1; >> _mask <<= (cycles < max_shift) ? cycles : max_shift; >> } >> } >> >> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: >> >> // The following two routines assume that the root Pipeline_Use entity >> // consists of exactly 1 element for each functional unit >> // start is relative to the current cycle; used for latency-based info >> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { >> for (uint i = 0; i < pred._count; i++) { >> const Pipeline_Use_Element *predUse = pred.element(i); >> if (predUse->_multiple) { >> uint min_delay = 7; >> // Multiple possible functional units, choose first unused one >> for (uint j = predUse->_lb; j <= predUse->_ub; j++) { >> const Pipeline_Use_Element *currUse = element(j); >> uint curr_delay = delay; >> if (predUse->_used & currUse->_used) { >> Pipeline_Use_Cycle_Mask x = predUse->_mask; >> Pipeline_Use_Cycle_Mask y = currUse->_mask; >> >> for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) >> y <<= 1; >> } >> if (min_delay > curr_delay) >> min_delay = curr_delay; >> } >> if (delay < min_delay) >> delay = min_delay; >> } >> else { >> for (uint j = predUse->_lb; j <= pre... > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > use uint32_t for _mask Looks good. I will submit testing. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26890#pullrequestreview-3151949738 From epeter at openjdk.org Mon Aug 25 15:21:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Aug 2025 15:21:58 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v8] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 18:09:14 GMT, Manuel H?ssig wrote: >> This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. >> >> Testing: >> - [x] Github Actions >> - [x] tier1,tier2 plus some internal testing on Oracle supported platforms > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge branch 'master' into JDK-8365262 > - Remove excess newline > - Fix indentation > - Improve comments > - Fix copy pasta mistakes > - Improvements prompted by Emanuel > - Fix test > - Better counting in tests > - post processing of flags and documentation > - Make the test work > - ... and 5 more: https://git.openjdk.org/jdk/compare/d9c76256...771924f0 Looks much better already with a few extra comments :) test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 380: > 378: .reduce( > 379: Stream.of(Collections.emptyList()), // Initialize Stream> acc with a Stream containing an empty list of Strings. > 380: (acc, set) -> // (Stream>, Stream>) -> Stream> You could probably put the types in the argument capture, no? Then it would become actual code. test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 397: > 395: .flatMap(Collection::stream) // Flatten the Stream> into Stream>. > 396: .filter(s -> !s.isEmpty()) // Remove empty string flags. > 397: .distinct() Is this necessary? Could this reorder things? Sometimes order is relevant. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26762#issuecomment-3220710336 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2298401628 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2298408374 From jkarthikeyan at openjdk.org Mon Aug 25 15:27:52 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 25 Aug 2025 15:27:52 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 06:20:43 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comment for constraint casts > > @jaskarth Thanks for the fix, it looks good to me now :) > I'm just running some internal testing now, please ping me after the weekend :) Thanks a lot for the testing @eme64! I think I need another review to merge it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26827#issuecomment-3220726571 From rehn at openjdk.org Mon Aug 25 16:57:42 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 25 Aug 2025 16:57:42 GMT Subject: RFR: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way [v2] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 12:35:12 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> This is a follow-up of https://github.com/openjdk/jdk/pull/26838, fixes the vector version in a similar way. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > comments & readability Thanks! Yes, it's unclear to me why it was %39 and i/39 and now %3 and /39. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26883#pullrequestreview-3152305645 From manc at openjdk.org Mon Aug 25 19:43:53 2025 From: manc at openjdk.org (Man Cao) Date: Mon, 25 Aug 2025 19:43:53 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation Message-ID: Hi, Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause. I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation. -Man ------------- Commit messages: - 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation Changes: https://git.openjdk.org/jdk/pull/26932/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26932&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8366118 Stats: 21 lines in 1 file changed: 1 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/26932.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26932/head:pull/26932 PR: https://git.openjdk.org/jdk/pull/26932 From jiangli at openjdk.org Mon Aug 25 20:41:38 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 25 Aug 2025 20:41:38 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation In-Reply-To: References: Message-ID: <-c191lNto9KD8wARJvnyEYjwgQDIRGTUSxMTHdBS980=.eee47999-e36c-4365-9302-4ac96da41c22@github.com> On Mon, 25 Aug 2025 19:38:23 GMT, Man Cao wrote: > Hi, > > Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause. > > I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation. > > -Man I looked at the internal change on JDK 21 and suggested to send the change for review in OpenJDK to get feedback from compiler-dev, as I have a couple of questions for the change. I'll post my questions as review comments to the PR as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26932#issuecomment-3221672961 From jiangli at openjdk.org Mon Aug 25 20:57:34 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 25 Aug 2025 20:57:34 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation In-Reply-To: References: Message-ID: On Mon, 25 Aug 2025 19:38:23 GMT, Man Cao wrote: > Hi, > > Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause. > > I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation. > > -Man src/hotspot/share/compiler/compilationPolicy.cpp line 925: > 923: } > 924: > 925: if (!CompilationModeFlag::disable_intermediate()) { AFAICT, the block of code here is intended for handling the case when intermediate is not disabled. Your change subtly alters that. When `TieredCompilation` is disabled, the large method compilation is done via `CompileBroker::compile_method` if `!CompileBroker::compilation_is_in_queue(mh)` is `true`. I confirmed that in lldb, see below. Is there any reason to not do `can_be_compiled` check when calling `CompileBroker::compile_method`? Additionally, should `can_be_compiled` check only be done for c2 compilation or if it should also be applied to c1 compilation? (lldb) bt * thread #18, name = 'ApexEnumTest_de', stop reason = step in * frame #0: 0x00007ffff49b70ae libjvm.so`CompileBroker::compile_method(method=0x00007ffff38dba10, osr_bci=-1, comp_level=4, hot_method=0x00007ffff38dba10, hot_count=6784, compile_reason=Reason_Tiered, __the_thread__=0x00001354ff8c9810) at compileBroker.cpp:1347:21 frame #1: 0x00007ffff49981fb libjvm.so`CompilationPolicy::compile(mh=0x00007ffff38dba10, bci=-1, level=CompLevel_full_optimization, __the_thread__=0x00001354ff8c9810) at compilationPolicy.cpp:824:5 frame #2: 0x00007ffff4997baf libjvm.so`CompilationPolicy::method_invocation_event(mh=0x00007ffff38dba10, imh=0x00007ffff38dba10, level=CompLevel_none, nm=0x0000000000000000, __the_thread__=0x00001354ff8c9810) at compilationPolicy.cpp:1160:7 frame #3: 0x00007ffff49979ea libjvm.so`CompilationPolicy::event(method=0x00007ffff38dba10, inlinee=0x00007ffff38dba10, branch_bci=-1, bci=-1, comp_level=CompLevel_none, nm=0x0000000000000000, __the_thread__=0x00001354ff8c9810) at compilationPolicy.cpp:745:5 frame #4: 0x00007ffff4d79dd8 libjvm.so`InterpreterRuntime::frequency_counter_overflow_inner(current=0x00001354ff8c9810, branch_bcp=0x0000000000000000) at interpreterRuntime.cpp:1066:21 frame #5: 0x00007ffff4d79a76 libjvm.so`InterpreterRuntime::frequency_counter_overflow(current=0x00001354ff8c9810, branch_bcp=0x0000000000000000) at interpreterRuntime.cpp:1015:17 frame #6: 0x00007fffe1c0ce41 frame #7: 0x00007fffe1c080a8 frame #8: 0x00007fffe1c00d01 frame #9: 0x00007ffff4d8786d libjvm.so`JavaCalls::call_helper(result=0x00007ffff38dc040, method=0x00007ffff38dbf90, args=0x00007ffff38dbec8, __the_thread__=0x00001354ff8c9810) at javaCalls.cpp:415:7 frame #10: 0x00007ffff522bd74 libjvm.so`os::os_exception_wrapper(f=(libjvm.so`JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) at javaCalls.cpp:332), value=0x00007ffff38dc040, method=0x00007ffff38dbf90, args=0x00007ffff38dbec8, thread=0x00001354ff8c9810) at os_linux.cpp:5190:3 frame #11: 0x00007ffff4d8685a libjvm.so`JavaCalls::call(result=0x00007ffff38dc040, method=0x00007ffff38dbf90, args=0x00007ffff38dbec8, __the_thread__=0x00001354ff8c9810) at javaCalls.cpp:329:3 frame #12: 0x00007ffff4e4a434 libjvm.so`jni_invoke_static(env=0x00001354ff8c9c10, result=0x00007ffff38dc040, receiver=0x0000000000000000, call_type=JNI_STATIC, method_id=0x00001354ffa77218, args=0x00007ffff38dc000, __the_thread__=0x00001354ff8c9810) at jni.cpp:888:3 frame #13: 0x00007ffff4e4d815 libjvm.so`jni_CallStaticVoidMethodV(env=0x00001354ff8c9c10, cls=0x00001354ffdbd412, methodID=0x00001354ffa77218, args=0x00007ffff38dc180) at jni.cpp:1728:3 frame #14: 0x00005555565a0bd0 ApexEnumTest_deploy.jar`JNIEnv_::CallStaticVoidMethod(_jclass*, _jmethodID*, ...) + 154 frame #15: 0x00005555565a0064 ApexEnumTest_deploy.jar`devtools_java_launcher::internal::LauncherMainImpl::JavaMain(JNIEnv_*, std::__u::pair, std::__u::allocator>, devtools_java_launcher::internal::ArgumentEncoding> const&, int, char**, std::__u::basic_string, std::__u::allocator> const&) + 314 frame #16: 0x00005555565a7e01 ApexEnumTest_deploy.jar`jvalue std::__u::__function::__policy_func::__call_func, std::__u::allocator>, devtools_java_launcher::internal::ArgumentEncoding>, int, char**, std::__u::basic_string, std::__u::allocator>>>(std::__u::__function::__policy_storage const*, JNIEnv_*) + 39 frame #17: 0x00005555565b32af ApexEnumTest_deploy.jar`devtools_java_launcher::internal::UserRequest::operator()(thread::Future*) + 413 frame #18: 0x00005555565b8e84 ApexEnumTest_deploy.jar`thread::Future::Impl::RunProducerAndUnref() + 102 frame #19: 0x00005555565ba513 ApexEnumTest_deploy.jar`util::functional::internal::FunctorCallback, void ()>::Run() + 89 frame #20: 0x00005555565b4122 ApexEnumTest_deploy.jar`devtools_java_launcher::internal::LauncherImpl::VmThreadRoutine(void*) + 98 frame #21: 0x00007ffff7e407db libpthread.so.0`start_thread + 187 frame #22: 0x00007ffff7db305f libc.so.6`__clone + 63 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26932#discussion_r2299117559 From manc at openjdk.org Mon Aug 25 22:39:34 2025 From: manc at openjdk.org (Man Cao) Date: Mon, 25 Aug 2025 22:39:34 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation In-Reply-To: References: Message-ID: On Mon, 25 Aug 2025 20:54:27 GMT, Jiangli Zhou wrote: > AFAICT, the block of code here is intended for handling the case when intermediate is not disabled. Your change subtly alters that. > When TieredCompilation is disabled, the large method compilation is done via CompileBroker::compile_method if !CompileBroker::compilation_is_in_queue(mh) is true. I confirmed that in lldb, see below. Is there any reason to not do can_be_compiled check when calling CompileBroker::compile_method? Trying to compile the large method under `-XX:-TieredCompilation` is the bug. The large method should not be compiled under `-XX:+DontCompileHugeMethods`. The bug is caused by erroneously guarding the `!can_be_compiled()` and `!can_be_osr_compiled()` checks behind `!CompilationModeFlag::disable_intermediate()`. The correct behavior is to do the following checks and returns regardless of `TieredCompilation`: if ((bci == InvocationEntryBci && !can_be_compiled(mh, level))) { return; } if ((bci != InvocationEntryBci && !can_be_osr_compiled(mh, level))) { return; } Only the recursive call to `compile(mh, bci, CompLevel_simple, THREAD)` and `osr_nm->make_not_entrant()` need to be guarded under `!disable_intermediate()`. It is possible to add the above two checks for `bci`, `can_be_compiled()` and `!can_be_osr_compiled()` to inside `CompileBroker::compile_method()`, specifically inside `CompileBroker::compilation_is_prohibited()`. If compiler-dev team prefers this way, we could move them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26932#discussion_r2299288827 From missa at openjdk.org Mon Aug 25 23:43:02 2025 From: missa at openjdk.org (Mohamed Issa) Date: Mon, 25 Aug 2025 23:43:02 GMT Subject: RFR: 8364305: Support AVX10 saturating floating point conversion instructions Message-ID: Intel® AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity. Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary register s to store intermediate results. This change uses the new AVX10.2 scalar (VCVTTSS2SIS or VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 ------------- Commit messages: - Enable new AVX 10.2 vector and scalar floating point conversion instructions Changes: https://git.openjdk.org/jdk/pull/26919/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8364305 Stats: 208 lines in 6 files changed: 203 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/26919.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919 PR: https://git.openjdk.org/jdk/pull/26919 From dlong at openjdk.org Tue Aug 26 00:29:34 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 26 Aug 2025 00:29:34 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3] In-Reply-To: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> References: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> Message-ID: On Mon, 25 Aug 2025 14:17:14 GMT, Boris Ulasevich wrote: >> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. >> >> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. >> >> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. >> >> The problems is that shift count `n` may be too large here: >> >> class Pipeline_Use_Cycle_Mask { >> protected: >> uint _mask; >> .. >> Pipeline_Use_Cycle_Mask& operator<<=(int n) { >> _mask <<= n; >> return *this; >> } >> }; >> >> The recent change attempted to cap the shift amount at one call site: >> >> class Pipeline_Use_Element { >> protected: >> .. >> // Mask of specific used cycles >> Pipeline_Use_Cycle_Mask _mask; >> .. >> void step(uint cycles) { >> _used = 0; >> uint max_shift = 8 * sizeof(_mask) - 1; >> _mask <<= (cycles < max_shift) ? cycles : max_shift; >> } >> } >> >> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: >> >> // The following two routines assume that the root Pipeline_Use entity >> // consists of exactly 1 element for each functional unit >> // start is relative to the current cycle; used for latency-based info >> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { >> for (uint i = 0; i < pred._count; i++) { >> const Pipeline_Use_Element *predUse = pred.element(i); >> if (predUse->_multiple) { >> uint min_delay = 7; >> // Multiple possible functional units, choose first unused one >> for (uint j = predUse->_lb; j <= predUse->_ub; j++) { >> const Pipeline_Use_Element *currUse = element(j); >> uint curr_delay = delay; >> if (predUse->_used & currUse->_used) { >> Pipeline_Use_Cycle_Mask x = predUse->_mask; >> Pipeline_Use_Cycle_Mask y = currUse->_mask; >> >> for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) >> y <<= 1; >> } >> if (min_delay > curr_delay) >> min_delay = curr_delay; >> } >> if (delay < min_delay) >> delay = min_delay; >> } >> else { >> for (uint j = predUse->_lb; j <= pre... > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > use uint32_t for _mask diff --git a/src/hotspot/share/adlc/adlparse.cpp b/src/hotspot/share/adlc/adlparse.cpp index 033e8d26ca7..ca6e8b7ed5e 100644 --- a/src/hotspot/share/adlc/adlparse.cpp +++ b/src/hotspot/share/adlc/adlparse.cpp @@ -1770,6 +1770,10 @@ void ADLParser::pipe_class_parse(PipelineForm &pipeline) { return; } + if (pipeline._maxcycleused < fixed_latency) { + pipeline._maxcycleused = fixed_latency; + } + pipe_class->setFixedLatency(fixed_latency); next_char(); skipws(); continue; I think this also solves the problem, because the 100 is coming from a `fixed_latency(100)` statement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3222137660 From kvn at openjdk.org Tue Aug 26 00:50:32 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 26 Aug 2025 00:50:32 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3] In-Reply-To: References: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> Message-ID: On Tue, 26 Aug 2025 00:27:10 GMT, Dean Long wrote: > I think this also solves the problem, because the 100 is coming from a fixed_latency(100) statement. Or we can fix `pipe_slow()`to use reasonable `fixed_latency` instead of arbitrary 100. It is used for float point instructions mostly and, I think, came from time when we used FPU instead of current SSE/AVX instructions. But I think code in `output_h.cpp` should be fixed, as proposed, regardless what we do with `fixed_latency`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3222174693 From dlong at openjdk.org Tue Aug 26 01:11:35 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 26 Aug 2025 01:11:35 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3] In-Reply-To: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> References: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> Message-ID: On Mon, 25 Aug 2025 14:17:14 GMT, Boris Ulasevich wrote: >> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. >> >> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. >> >> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. >> >> The problems is that shift count `n` may be too large here: >> >> class Pipeline_Use_Cycle_Mask { >> protected: >> uint _mask; >> .. >> Pipeline_Use_Cycle_Mask& operator<<=(int n) { >> _mask <<= n; >> return *this; >> } >> }; >> >> The recent change attempted to cap the shift amount at one call site: >> >> class Pipeline_Use_Element { >> protected: >> .. >> // Mask of specific used cycles >> Pipeline_Use_Cycle_Mask _mask; >> .. >> void step(uint cycles) { >> _used = 0; >> uint max_shift = 8 * sizeof(_mask) - 1; >> _mask <<= (cycles < max_shift) ? cycles : max_shift; >> } >> } >> >> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: >> >> // The following two routines assume that the root Pipeline_Use entity >> // consists of exactly 1 element for each functional unit >> // start is relative to the current cycle; used for latency-based info >> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { >> for (uint i = 0; i < pred._count; i++) { >> const Pipeline_Use_Element *predUse = pred.element(i); >> if (predUse->_multiple) { >> uint min_delay = 7; >> // Multiple possible functional units, choose first unused one >> for (uint j = predUse->_lb; j <= predUse->_ub; j++) { >> const Pipeline_Use_Element *currUse = element(j); >> uint curr_delay = delay; >> if (predUse->_used & currUse->_used) { >> Pipeline_Use_Cycle_Mask x = predUse->_mask; >> Pipeline_Use_Cycle_Mask y = currUse->_mask; >> >> for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) >> y <<= 1; >> } >> if (min_delay > curr_delay) >> min_delay = curr_delay; >> } >> if (delay < min_delay) >> delay = min_delay; >> } >> else { >> for (uint j = predUse->_lb; j <= pre... > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > use uint32_t for _mask Also note that the min_delay logic in Pipeline_Use::full_latency() initializes min_delay to _maxcycleused+1, so it does seem to expect _maxcycleused to be set to the max value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3222210698 From dlong at openjdk.org Tue Aug 26 02:04:33 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 26 Aug 2025 02:04:33 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3] In-Reply-To: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> References: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> Message-ID: On Mon, 25 Aug 2025 14:17:14 GMT, Boris Ulasevich wrote: >> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. >> >> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. >> >> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. >> >> The problems is that shift count `n` may be too large here: >> >> class Pipeline_Use_Cycle_Mask { >> protected: >> uint _mask; >> .. >> Pipeline_Use_Cycle_Mask& operator<<=(int n) { >> _mask <<= n; >> return *this; >> } >> }; >> >> The recent change attempted to cap the shift amount at one call site: >> >> class Pipeline_Use_Element { >> protected: >> .. >> // Mask of specific used cycles >> Pipeline_Use_Cycle_Mask _mask; >> .. >> void step(uint cycles) { >> _used = 0; >> uint max_shift = 8 * sizeof(_mask) - 1; >> _mask <<= (cycles < max_shift) ? cycles : max_shift; >> } >> } >> >> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: >> >> // The following two routines assume that the root Pipeline_Use entity >> // consists of exactly 1 element for each functional unit >> // start is relative to the current cycle; used for latency-based info >> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { >> for (uint i = 0; i < pred._count; i++) { >> const Pipeline_Use_Element *predUse = pred.element(i); >> if (predUse->_multiple) { >> uint min_delay = 7; >> // Multiple possible functional units, choose first unused one >> for (uint j = predUse->_lb; j <= predUse->_ub; j++) { >> const Pipeline_Use_Element *currUse = element(j); >> uint curr_delay = delay; >> if (predUse->_used & currUse->_used) { >> Pipeline_Use_Cycle_Mask x = predUse->_mask; >> Pipeline_Use_Cycle_Mask y = currUse->_mask; >> >> for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) >> y <<= 1; >> } >> if (min_delay > curr_delay) >> min_delay = curr_delay; >> } >> if (delay < min_delay) >> delay = min_delay; >> } >> else { >> for (uint j = predUse->_lb; j <= pre... > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > use uint32_t for _mask Yes, I think we should fix both, output_h.cpp and fixed_latency(100) on all platforms, then we can get rid of the workarounds and arm32-specific logic. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3222304657 From dzhang at openjdk.org Tue Aug 26 03:27:30 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 26 Aug 2025 03:27:30 GMT Subject: RFR: 8366127: RISC-V: compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running without RVV Message-ID: <02s9vetJYJkfb2e6CyDOGSJcYvIcJhYAL5DxpUUCnV0=.5100e1a2-e6df-4bdc-ba9a-d3f884fd4470@github.com> Hi, Can you help to review this patch? Thanks! We noticed that compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running on sg2042. The error is caused by the intrinsic `EncodeISOArray` corresponding to `encodeAsciiArray0` requiring RVV on riscv. (See `encode_iso_array_v` in `c2_MacroAssembler_riscv.cpp`) ### Test (fastdebug) - [x] Run compiler/intrinsics/TestVerifyIntrinsicChecks.java on k1 and sg2042 ------------- Commit messages: - 8366127: RISC-V: compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running without RVV Changes: https://git.openjdk.org/jdk/pull/26935/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26935&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8366127 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26935.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26935/head:pull/26935 PR: https://git.openjdk.org/jdk/pull/26935 From dholmes at openjdk.org Tue Aug 26 06:22:44 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 26 Aug 2025 06:22:44 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v7] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: On Fri, 22 Aug 2025 20:29:10 GMT, Igor Veresov wrote: >> This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > One more nit Thanks for updates re TRAPS etc. ------------- PR Review: https://git.openjdk.org/jdk/pull/26866#pullrequestreview-3154091489 From thartmann at openjdk.org Tue Aug 26 06:54:36 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 26 Aug 2025 06:54:36 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation In-Reply-To: References: Message-ID: On Mon, 25 Aug 2025 19:38:23 GMT, Man Cao wrote: > Hi, > > Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause. > > I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation. > > -Man But setting `-XX:-DontCompileHugeMethods` will still crash the VM, right? Do you have a regression test for triggering the crash? I think we should fix the root cause of the stack overflow (separately). ------------- PR Comment: https://git.openjdk.org/jdk/pull/26932#issuecomment-3222858620 From fyang at openjdk.org Tue Aug 26 07:02:34 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 26 Aug 2025 07:02:34 GMT Subject: RFR: 8366127: RISC-V: compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running without RVV In-Reply-To: <02s9vetJYJkfb2e6CyDOGSJcYvIcJhYAL5DxpUUCnV0=.5100e1a2-e6df-4bdc-ba9a-d3f884fd4470@github.com> References: <02s9vetJYJkfb2e6CyDOGSJcYvIcJhYAL5DxpUUCnV0=.5100e1a2-e6df-4bdc-ba9a-d3f884fd4470@github.com> Message-ID: <5O_83VCZCvvS-20dR0ellkcfSTExHMysA0s5wnJ-T8Y=.35ca20c1-8973-4568-a312-135f2b0fc571@github.com> On Tue, 26 Aug 2025 03:19:09 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > We noticed that compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running on sg2042. > The error is caused by the intrinsic `EncodeISOArray` corresponding to `encodeAsciiArray0` requiring RVV on riscv. (See `encode_iso_array_v` in `c2_MacroAssembler_riscv.cpp`) > > ### Test (fastdebug) > - [x] Run compiler/intrinsics/TestVerifyIntrinsicChecks.java on k1 and sg2042 LGTM. Thanks for finding this! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26935#pullrequestreview-3154219419 From fjiang at openjdk.org Tue Aug 26 07:09:40 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 26 Aug 2025 07:09:40 GMT Subject: RFR: 8366127: RISC-V: compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running without RVV In-Reply-To: <02s9vetJYJkfb2e6CyDOGSJcYvIcJhYAL5DxpUUCnV0=.5100e1a2-e6df-4bdc-ba9a-d3f884fd4470@github.com> References: <02s9vetJYJkfb2e6CyDOGSJcYvIcJhYAL5DxpUUCnV0=.5100e1a2-e6df-4bdc-ba9a-d3f884fd4470@github.com> Message-ID: On Tue, 26 Aug 2025 03:19:09 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > We noticed that compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running on sg2042. > The error is caused by the intrinsic `EncodeISOArray` corresponding to `encodeAsciiArray0` requiring RVV on riscv. (See `encode_iso_array_v` in `c2_MacroAssembler_riscv.cpp`) > > ### Test (fastdebug) > - [x] Run compiler/intrinsics/TestVerifyIntrinsicChecks.java on k1 and sg2042 Thanks for finding this! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/26935#pullrequestreview-3154240096 From eosterlund at openjdk.org Tue Aug 26 07:52:51 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 26 Aug 2025 07:52:51 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v43] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 23:35:45 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer Looks good. Ship it! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23573#pullrequestreview-3154398292 From manc at openjdk.org Tue Aug 26 08:06:37 2025 From: manc at openjdk.org (Man Cao) Date: Tue, 26 Aug 2025 08:06:37 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation In-Reply-To: References: Message-ID: On Tue, 26 Aug 2025 06:52:22 GMT, Tobias Hartmann wrote: > But setting `-XX:-DontCompileHugeMethods` will still crash the VM, right? Do you have a regression test for triggering the crash? > > I think we should fix the root cause of the stack overflow (separately). You are right. Created https://bugs.openjdk.org/browse/JDK-8366138 for the stack overflow itself. We have not found a local repro yet. I'll keep looking and try providing a repro for JDK-8366138 later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26932#issuecomment-3223067080 From thartmann at openjdk.org Tue Aug 26 08:16:36 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 26 Aug 2025 08:16:36 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation In-Reply-To: References: Message-ID: On Mon, 25 Aug 2025 19:38:23 GMT, Man Cao wrote: > Hi, > > Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause. > > I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation. > > -Man Sounds good, thank you! @veresov Might want to look at this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26932#issuecomment-3223097261 From roland at openjdk.org Tue Aug 26 09:29:53 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 26 Aug 2025 09:29:53 GMT Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit, limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v5] In-Reply-To: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com> References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com> Message-ID: > A node in a pre loop only has uses out of the loop dominated by the > loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control > to the loop exit projection. A range check in the main loop has this > node as input (through a chain of some other nodes). Range check > elimination needs to update the exit condition of the pre loop with an > expression that depends on the node pinned on its exit: that's > impossible and the assert fires. This is a variant of 8314024 (this > one was for a node with uses out of the pre loop on multiple paths). I > propose the same fix: leave the node with control in the pre loop in > this case. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26424/files - new: https://git.openjdk.org/jdk/pull/26424/files/cc64aa6f..6da75e9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26424&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26424&range=03-04 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26424.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26424/head:pull/26424 PR: https://git.openjdk.org/jdk/pull/26424 From roland at openjdk.org Tue Aug 26 09:29:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 26 Aug 2025 09:29:56 GMT Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit, limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v4] In-Reply-To: References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com> <6dWR-SxhuKd9-T3q313I6at4vTBcYlufyCBNjGGopv4=.cae3abea-0752-4191-ac08-890476489af3@github.com> Message-ID: On Mon, 25 Aug 2025 15:12:55 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8361702 >> - Update src/hotspot/share/opto/loopopts.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopopts.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java >> >> Co-authored-by: Christian Hagedorn >> - tests >> - fix > > test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java line 28: > >> 26: * @bug 8361702 >> 27: * @summary C2: assert(is_dominator(compute_early_ctrl(limit, limit_ctrl), pre_end)) failed: node pinned on loop exit test? >> 28: * @requires vm.flavor == "server" > > Would this test fail without this requires? Or could we remove it, in the hopes of catching something else somewhere else? The `@requires` is there because the test run needs command line options that are c2 specific. > test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java line 30: > >> 28: * @requires vm.flavor == "server" >> 29: * >> 30: * @run main/othervm -XX:-BackgroundCompilation -XX:LoopUnrollLimit=100 -XX:-UseLoopPredicate -XX:-UseProfiledLoopPredicate TestSunkRangeFromPreLoopRCE3 > > Could we have a run without any flags / fewer flags? Just in case it catches something else / related. Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26424#discussion_r2300383396 PR Review Comment: https://git.openjdk.org/jdk/pull/26424#discussion_r2300383800 From mli at openjdk.org Tue Aug 26 09:57:44 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 26 Aug 2025 09:57:44 GMT Subject: RFR: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > > This is a follow-up of https://github.com/openjdk/jdk/pull/26838, fixes the vector version in a similar way. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: minor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26883/files - new: https://git.openjdk.org/jdk/pull/26883/files/fa107180..858ea664 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26883&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26883&range=01-02 Stats: 14 lines in 1 file changed: 3 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/26883.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26883/head:pull/26883 PR: https://git.openjdk.org/jdk/pull/26883 From mli at openjdk.org Tue Aug 26 09:57:46 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 26 Aug 2025 09:57:46 GMT Subject: RFR: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way [v2] In-Reply-To: References: Message-ID: On Sun, 24 Aug 2025 07:30:03 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> comments & readability > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2495: > >> 2493: __ bind(stub.entry()); >> 2494: >> 2495: // mul is already set to mf2 in float_to_float16_v. > > Although not directly related, can you rename `tmp` to `vtmp` and add an assertion about the three vector regiters (just like we do in `C2_MacroAssembler::float_to_float16_v`)? And it would help if we add some extra code comment about `v0` mask register which indicates which elements are NaNs. Or maybe better to pass `v0` as well? > > What I mean is something like: > > assert_different_registers(dst, src, vtmp); > > // Active elements (NaNs) are marked in v0 mask register > // and mul is already set to mf2 in float_to_float16_v. I'll add the comment you suggested. But Seems passing `v0` is not necessary and weird here, as in the code we don't use `v0` directly. > test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVectorNaN.java line 92: > >> 90: // Setup >> 91: for (int i = 0; i < ARRLEN; i++) { >> 92: if (i%3 == 0) { > > Question: What is this change for? Do you have more details? `39` was to make the NaN distributed sparsely in the array. As we face the NaN calculation issue, so the change is to test NaN more frequently with a random shift. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26883#discussion_r2300455908 PR Review Comment: https://git.openjdk.org/jdk/pull/26883#discussion_r2300456375 From adinn at openjdk.org Tue Aug 26 10:56:34 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 26 Aug 2025 10:56:34 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3] In-Reply-To: References: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> Message-ID: On Tue, 26 Aug 2025 02:02:10 GMT, Dean Long wrote: > Yes, I think we should fix both, output_h.cpp and fixed_latency(100) on all platforms, then we can get rid of the workarounds and arm32-specific logic. When I looked into this earlier I thought the obvious thing needed to fix this was to reassign all the latencies so they represented a realizable pipeline delay. A proper fix would sensibly require each latency to be less than the pipeline length declared in the CPU model -- which for most arches is much less than 32. However, I didn't suggest such a rationalization because I believed (perhaps wrongly) that the latencies were also used to pick a preferred choice when we have alternative instruction/operand rule matches. The selection process involves comparing the cumulative latencies for subgraph nodes against the latency of each node defined by a match rule for the subgraph and picking the lowest latency result. After looking at some of the rules I was not sure that it would be easy to reduce all current latencies so they lie in the range 0-31 and still guarantee the current selection order. It would be even harder when the range was correctly reduced to 0 - lengthof(pipeline). I don't even think most rule authors understand that the latencies are used by the pipeline model and instead they simply use latency as a weight to enforce orderings. That's certainly how I understood it until I ran into this issue. If so then perhaps we would be better sticking with the de facto use and fixing the shift issue with a maximum shift bound. The mask tests which rely on this shift count may help with deriving scheduling delays for some instructions with small latencies but I don't believe it is very reliable even in cases where the accumulated shifts lie within the 32 bit range. If we are to change anything here then I think we need a review of the accuracy of pipeline models and their current or potential value before doing so. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3223654432 From fyang at openjdk.org Tue Aug 26 12:24:44 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 26 Aug 2025 12:24:44 GMT Subject: RFR: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way [v3] In-Reply-To: References: Message-ID: <9YetQdVKkcpVZJpkE_YZva4uVbNTXWDvCm-r4ux8Zro=.016173bb-a4f3-4c00-ad46-e6989e18a312@github.com> On Tue, 26 Aug 2025 09:57:44 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> This is a follow-up of https://github.com/openjdk/jdk/pull/26838, fixes the vector version in a similar way. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Looks better to me. Thanks for the update. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26883#pullrequestreview-3155389163 From hgreule at openjdk.org Tue Aug 26 12:46:31 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 26 Aug 2025 12:46:31 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v7] In-Reply-To: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: > This change improves the precision of the `Mod(I|L)Node::Value()` functions. > > I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early. > The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions. > > ### Monotonicity > > Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). > > ### Testing > > I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something). > > Please review and let me know what you think. > > ### Other > > The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508. > > During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into: > - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? > - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25254/files - new: https://git.openjdk.org/jdk/pull/25254/files/11210414..5c74919a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25254/head:pull/25254 PR: https://git.openjdk.org/jdk/pull/25254 From epeter at openjdk.org Tue Aug 26 12:50:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 26 Aug 2025 12:50:37 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Thu, 21 Aug 2025 10:30:40 GMT, Fei Gao wrote: >> Thanks for the updates. I gave it a quick scan and proposed some changes. I can look at it again once you repond to these :) >> (we currently have lots of reviews, so I need to do a little round-robin here ? ) > > Hi @eme64 , I?ve addressed some corner case failures and refined parts of the code in the new commit. Would you like to review it? Thanks! @fg1417 Yes, I'd love to review! I'll try to have a look in the next days. It would be good if you re-ran the benchmarks. It seems the last ones you did in December of 2024. We should see that we have various benchmarks, both for array and MemorySegment. You could look at the array benchmarks from here: https://github.com/openjdk/jdk/pull/22070 ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3224041030 From hgreule at openjdk.org Tue Aug 26 12:59:35 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 26 Aug 2025 12:59:35 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v6] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: On Mon, 25 Aug 2025 13:18:28 GMT, Emanuel Peter wrote: > Looks really good now. I think we can almost integrate now. Thanks for the review :) > One thing I'm wondering: could this be extended to `UModI/L`? That can of course be a separate RFE as well. And yet another idea: could we use the known bits? See #17508. Yes, `UModI/L` could be done now in a similar fashion using usigned ranges. I can open an RFE later. I'm not sure if we can get more precise bitwise information than what the canonicalization already does. I don't see anything obvious there at least. > Can you show some examples? Filing an RFE would surely not be wrong. https://gist.github.com/SirYwell/151a48c90d12593bf500028389bdd07c this is an example. (Currently, we don't detect patterns like `Math.floorMod(...)`, so I'm just casting to char to get a nonnegative value). In the patched version, I added a bailout in `transform_int_divide` to to delay the transformation to IGVN. This way, we actually run `ModI::Value()` and get a type that lets us eliminate the CmpU. There are probably better ways to achieve that :) I wonder if there are more such scenarios, and if it's worth to calculate some initial type before `Ideal()`... ------------- PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-3224071958 From bkilambi at openjdk.org Tue Aug 26 13:07:21 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 26 Aug 2025 13:07:21 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v8] In-Reply-To: References: Message-ID: > After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - > `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - > > > public void vectorAddConstInputFloat16() { > for (int i = 0; i < LEN; ++i) { > output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); > } > } > > > > > > The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. > > This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). > > Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Modified JTREG testcase to address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26589/files - new: https://git.openjdk.org/jdk/pull/26589/files/278ada47..c0bc9a51 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26589&range=06-07 Stats: 32 lines in 1 file changed: 18 ins; 6 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/26589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26589/head:pull/26589 PR: https://git.openjdk.org/jdk/pull/26589 From bkilambi at openjdk.org Tue Aug 26 13:07:21 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 26 Aug 2025 13:07:21 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: <0RKz6D0V5DMA_HAnDKHVOFt-JDxWOBCTu4TTG29MfmI=.e2599d8f-74e2-4a76-9f75-38a6cba2f5ca@github.com> On Fri, 15 Aug 2025 11:54:59 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments Hi @eme64 Can you please review the new patch? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3224095429 From bkilambi at openjdk.org Tue Aug 26 13:07:21 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 26 Aug 2025 13:07:21 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: <04_IpSYiBu9iLViEV2V5opYFqN7OzNewgUEOLSs_Cwc=.a8c693cd-900d-4602-9b88-76dd55f9a844@github.com> Message-ID: On Thu, 21 Aug 2025 06:02:45 GMT, Emanuel Peter wrote: >> It fails to match the IR nodes. I think it happened when I used a smaller `Warmup`. With the `Warmup` I am using, it seems to be working fine. I will add that case as well. > > Ok, so then it is probably a profile issue. Thanks for adding both runs! Added both the runs in the new patch. The test passes on x86 and graviton3. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2300925200 From bkilambi at openjdk.org Tue Aug 26 13:07:22 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 26 Aug 2025 13:07:22 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 07:49:50 GMT, Jatin Bhateja wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 91: > >> 89: if (expected != output[i]) { >> 90: throw new AssertionError("Result Mismatch!, input = " + input[i] + " constant = " + FP16_IN_RANGE + " actual = " + output[i] + " expected = " + expected); >> 91: } > > Prefer using Verify.check* > https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/verify/Verify.java Done > test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 121: > >> 119: if (expected != output[i]) { >> 120: throw new AssertionError("Result Mismatch!, input = " + input[i] + " constant = " + FP16_OUT_OF_RANGE + " actual = " + output[i] + " expected = " + expected); >> 121: } > > As above, please use Verify.check* API. Done. thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2300923979 PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2300923665 From chagedorn at openjdk.org Tue Aug 26 13:23:37 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 26 Aug 2025 13:23:37 GMT Subject: RFR: 8365909: [REDO] Add a compilation timeout flag to catch long running compilations [v2] In-Reply-To: References: Message-ID: On Mon, 25 Aug 2025 07:20:44 GMT, Manuel H?ssig wrote: >> This PR adds a timeout for compilation tasks based on timer signals on Linux debug builds. >> >> This PR is a redo of #25872 with fixes for the failing test. >> >> Testing: >> - [x] Github Actions >> - [x] tier1,tier2 plus internal testing on all Oracle supproted platforms >> - [x] tier3,tier4 on linux-x64-debug >> - [x] tier1,tier2,tier3,tier4 on linux-x64-debug with `-XX:CompileTaskTimeout=60000` > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26882#pullrequestreview-3155604388 From mhaessig at openjdk.org Tue Aug 26 13:23:57 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 26 Aug 2025 13:23:57 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v9] In-Reply-To: References: Message-ID: > This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus some internal testing on Oracle supported platforms Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26762/files - new: https://git.openjdk.org/jdk/pull/26762/files/771924f0..a047ba39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26762&range=07-08 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26762.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26762/head:pull/26762 PR: https://git.openjdk.org/jdk/pull/26762 From mhaessig at openjdk.org Tue Aug 26 13:24:00 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 26 Aug 2025 13:24:00 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v8] In-Reply-To: References: Message-ID: On Mon, 25 Aug 2025 15:16:41 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8365262 >> - Remove excess newline >> - Fix indentation >> - Improve comments >> - Fix copy pasta mistakes >> - Improvements prompted by Emanuel >> - Fix test >> - Better counting in tests >> - post processing of flags and documentation >> - Make the test work >> - ... and 5 more: https://git.openjdk.org/jdk/compare/c1fb5caa...771924f0 > > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 380: > >> 378: .reduce( >> 379: Stream.of(Collections.emptyList()), // Initialize Stream> acc with a Stream containing an empty list of Strings. >> 380: (acc, set) -> // (Stream>, Stream>) -> Stream> > > You could probably put the types in the argument capture, no? Then it would become actual code. Good point. This made me realize, I wrote down the wrong types... > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 397: > >> 395: .flatMap(Collection::stream) // Flatten the Stream> into Stream>. >> 396: .filter(s -> !s.isEmpty()) // Remove empty string flags. >> 397: .distinct() > > Is this necessary? Could this reorder things? Sometimes order is relevant. It is not. I removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2300966957 PR Review Comment: https://git.openjdk.org/jdk/pull/26762#discussion_r2300967457 From mhaessig at openjdk.org Tue Aug 26 13:26:44 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 26 Aug 2025 13:26:44 GMT Subject: RFR: 8365909: [REDO] Add a compilation timeout flag to catch long running compilations In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 20:56:42 GMT, Dean Long wrote: >> This PR adds a timeout for compilation tasks based on timer signals on Linux debug builds. >> >> This PR is a redo of #25872 with fixes for the failing test. >> >> Testing: >> - [x] Github Actions >> - [x] tier1,tier2 plus internal testing on all Oracle supproted platforms >> - [x] tier3,tier4 on linux-x64-debug >> - [x] tier1,tier2,tier3,tier4 on linux-x64-debug with `-XX:CompileTaskTimeout=60000` > > Please explain the test fix. Thank you for your reviews, @dean-long and @chhagedorn. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26882#issuecomment-3224167084 From mhaessig at openjdk.org Tue Aug 26 13:26:46 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 26 Aug 2025 13:26:46 GMT Subject: Integrated: 8365909: [REDO] Add a compilation timeout flag to catch long running compilations In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 11:56:17 GMT, Manuel H?ssig wrote: > This PR adds a timeout for compilation tasks based on timer signals on Linux debug builds. > > This PR is a redo of #25872 with fixes for the failing test. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus internal testing on all Oracle supproted platforms > - [x] tier3,tier4 on linux-x64-debug > - [x] tier1,tier2,tier3,tier4 on linux-x64-debug with `-XX:CompileTaskTimeout=60000` This pull request has now been integrated. Changeset: aae13af0 Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/aae13af04bda541a80f74adff5dbf65f44c8271a Stats: 281 lines in 8 files changed: 278 ins; 0 del; 3 mod 8365909: [REDO] Add a compilation timeout flag to catch long running compilations Co-authored-by: Dean Long Reviewed-by: chagedorn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/26882 From bkilambi at openjdk.org Tue Aug 26 13:30:39 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 26 Aug 2025 13:30:39 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: <0q89hNzCz5i1sosjpLNbxNZE5uyBYGmRZQ5c2c78bl0=.8895c287-d548-4d83-b414-5459bd4826f9@github.com> Message-ID: On Thu, 21 Aug 2025 08:58:42 GMT, Bhavana Kilambi wrote: >> Ok, in general its advisable to use Generators for any initialization, another suggestion, you can also generate constant dynamically through @Stable arrays, here is an example >> >> >> >> import jdk.internal.vm.annotation.Stable; >> import java.util.concurrent.ThreadLocalRandom; >> >> public class random_constants { >> public static final int idx = ThreadLocalRandom.current().nextInt(1023); >> >> @Stable >> public static int [] arr; >> >> public static void init() { >> arr = new int[1024]; >> for (int i = 0; i < 1024; i++) { >> arr[i] = ThreadLocalRandom.current().nextInt(); >> } >> } >> >> public static int yeild_number() { >> return arr[idx] + 10; >> } >> >> public static void main(String [] args) { >> int res = 0; >> init(); >> for (int i = 0; i < 100000; i++) { >> res += yeild_number(); >> } >> System.out.println("[res] " + res); >> } >> } >> >> PROMPT>java --add-exports=java.base/jdk.internal.vm.annotation=ALL-UNNAMED -Xbatch -XX:-TieredCompilation -Xbootclasspath/a:. -XX:CompileCommand=PrintIdealPhase,random_constants::yeild_number,BEFORE_MATCHING -cp . random_constants >> CompileCommand: PrintIdealPhase random_constants.yeild_number const char* PrintIdealPhase = 'BEFORE_MATCHING' >> AFTER: BEFORE_MATCHING >> 0 Root === 0 32 [[ 0 1 3 31 ]] inner >> 3 Start === 3 0 [[ 3 5 6 7 8 9 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address} >> 5 Parm === 3 [[ 32 ]] Control !jvms: random_constants::yeild_number @ bci:-1 (line 19) >> 6 Parm === 3 [[ 32 ]] I_O !jvms: random_constants::yeild_number @ bci:-1 (line 19) >> 7 Parm === 3 [[ 32 ]] Memory Memory: @BotPTR *+bot, idx=Bot; !jvms: random_constants::yeild_number @ bci:-1 (line 19) >> 8 Parm === 3 [[ 32 ]] FramePtr !jvms: random_constants::yeild_number @ bci:-1 (line 19) >> 9 Parm === 3 [[ 32 ]] ReturnAdr !jvms: random_constants::yeild_number @ bci:-1 (line 19) >> 31 ConI === 0 [[ 32 ]] #int:-753356878 >> 32 Return === 5 6 7 8 9 returns 31 [[ 0 ]] >> [res] -1961428160 >> >> >> You can directly pass arr[idx] as constant argument to relevant Float16 APIs. > > Thanks for sharing. This looks interesting. Thanks a lot for the suggestion but I couldn't find any extra benefit of using `@Stable` for a constant variable. I feel it's more useful in case of arrays (maybe I could use an array of the two final fields in the testcase but I felt the current version is more understandable). I will keep this in mind going forward but just to not complicate a simple testcase, I have just kept the `final` keyword for the constants instead. Could you please review the new patch? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2300997957 From bulasevich at openjdk.org Tue Aug 26 14:20:35 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 26 Aug 2025 14:20:35 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3] In-Reply-To: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> References: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> Message-ID: On Mon, 25 Aug 2025 14:17:14 GMT, Boris Ulasevich wrote: >> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. >> >> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. >> >> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. >> >> The problems is that shift count `n` may be too large here: >> >> class Pipeline_Use_Cycle_Mask { >> protected: >> uint _mask; >> .. >> Pipeline_Use_Cycle_Mask& operator<<=(int n) { >> _mask <<= n; >> return *this; >> } >> }; >> >> The recent change attempted to cap the shift amount at one call site: >> >> class Pipeline_Use_Element { >> protected: >> .. >> // Mask of specific used cycles >> Pipeline_Use_Cycle_Mask _mask; >> .. >> void step(uint cycles) { >> _used = 0; >> uint max_shift = 8 * sizeof(_mask) - 1; >> _mask <<= (cycles < max_shift) ? cycles : max_shift; >> } >> } >> >> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: >> >> // The following two routines assume that the root Pipeline_Use entity >> // consists of exactly 1 element for each functional unit >> // start is relative to the current cycle; used for latency-based info >> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { >> for (uint i = 0; i < pred._count; i++) { >> const Pipeline_Use_Element *predUse = pred.element(i); >> if (predUse->_multiple) { >> uint min_delay = 7; >> // Multiple possible functional units, choose first unused one >> for (uint j = predUse->_lb; j <= predUse->_ub; j++) { >> const Pipeline_Use_Element *currUse = element(j); >> uint curr_delay = delay; >> if (predUse->_used & currUse->_used) { >> Pipeline_Use_Cycle_Mask x = predUse->_mask; >> Pipeline_Use_Cycle_Mask y = currUse->_mask; >> >> for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) >> y <<= 1; >> } >> if (min_delay > curr_delay) >> min_delay = curr_delay; >> } >> if (delay < min_delay) >> delay = min_delay; >> } >> else { >> for (uint j = predUse->_lb; j <= pre... > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > use uint32_t for _mask > ``` > + if (pipeline._maxcycleused < fixed_latency) { > + pipeline._maxcycleused = fixed_latency; > + } > + > ``` > I think this also solves the problem, because the 100 is coming from a `fixed_latency(100)` statement. @dean-long Right! I checked that, it makes ubsan quiet. Please note. 100 isn?t the only triggering value. With an extra trace on macosx-aarch64 I see: printf("%i -> %i\n", pipeline._maxcycleused, fixed_latency); 6 -> 8 8 -> 16 16 -> 100 If we resolve it at parse stage, I think we should do the opposite: limit the user-specified value to maxcycleused. diff --git a/src/hotspot/share/adlc/adlparse.cpp b/src/hotspot/share/adlc/adlparse.cpp index 033e8d26ca7..1060f7b18ab 100644 --- a/src/hotspot/share/adlc/adlparse.cpp +++ b/src/hotspot/share/adlc/adlparse.cpp @@ -1770,7 +1770,7 @@ void ADLParser::pipe_class_parse(PipelineForm &pipeline) { return; } - pipe_class->setFixedLatency(fixed_latency); + pipe_class->setFixedLatency(fixed_latency <= pipeline._maxcycleused ? fixed_latency : pipeline._maxcycleused); next_char(); skipws(); continue; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3224369904 From kxu at openjdk.org Tue Aug 26 14:47:00 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 26 Aug 2025 14:47:00 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v8] In-Reply-To: References: Message-ID: <7qgNsgKbFFtzVwuDG2yM_vIczHbzMj6ZUKh_7sz1qow=.d9aeab55-f647-43bc-af2a-48f23d5bbcca@github.com> > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: - Merge branch 'openjdk:master' into counted-loop-refactor - Merge remote-tracking branch 'origin/master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp # src/hotspot/share/opto/loopnode.hpp - Merge branch 'master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp # src/hotspot/share/opto/loopnode.hpp # src/hotspot/share/opto/loopopts.cpp - Merge remote-tracking branch 'origin/master' into counted-loop-refactor - further refactor is_counted_loop() by extracting functions - WIP: refactor is_counted_loop() - WIP: refactor is_counted_loop() - WIP: review followups - reviewer suggested changes - line break - ... and 14 more: https://git.openjdk.org/jdk/compare/173dedfb...763adeda ------------- Changes: https://git.openjdk.org/jdk/pull/24458/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=07 Stats: 927 lines in 3 files changed: 423 ins; 211 del; 293 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From kxu at openjdk.org Tue Aug 26 14:47:31 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 26 Aug 2025 14:47:31 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v18] In-Reply-To: References: Message-ID: <53Ado9oN1yU5hgOPU2feecxsArD5yoycn09ZWPNK4AQ=.69035bde-9bec-442e-8dc2-ddd268df9d07@github.com> > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 67 commits: - Merge branch 'openjdk:master' into arithmetic-canonicalization - Merge remote-tracking branch 'origin/master' into arithmetic-canonicalization - Allow swapping LHS/RHS in case not matched - Merge branch 'refs/heads/master' into arithmetic-canonicalization - improve comment readability and struct helper functions - remove asserts, add more documentation - fix typo: lhs->rhs - update comments - use java_add to avoid cpp overflow UB - add assertion for MulLNode too - ... and 57 more: https://git.openjdk.org/jdk/compare/173dedfb...7bb7e645 ------------- Changes: https://git.openjdk.org/jdk/pull/23506/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=17 Stats: 849 lines in 6 files changed: 848 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From kxu at openjdk.org Tue Aug 26 14:47:33 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 26 Aug 2025 14:47:33 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 11:50:38 GMT, Emanuel Peter wrote: >> Ping @eme64 again for awareness. :) > > @tabjy > >> I could, at very least, try to swap LHS and RHS if no match is found > > I think that would be a good idea, and not very hard. You can just have a function `add_pattern(lhs, rhs)`, and then run it also with `add_pattern(rhs, lhs)` for **swapping**. > > Personally, I would have preferred a recursive algorithm, but that could have some compile time overhead. @chhagedorn Was a little more skeptical about the recursive algorithm. > > It seems the motivation for this change is the benchmark from here: > ArithmeticCanonicalizationBenchmark > https://ionutbalosin.com/2024/02/jvm-performance-comparison-for-jdk-21/#jit-compiler > > This benchmark is of course somewhat arbitrary, and so are now all of your added patterns. Having a most general solution would be nice, but maybe the recursive algorithm is too much, I'm not 100% sure. Of course we now still have cases that do not optimize/canonicalize, and so someone could write a benchmark for those cases still.. oh well. > > What I would like to see for **testing**: add some more patterns with IR rules. More that now optimize, and also a few that do not optimize, just so we have a bit of a sense what we are still missing. > > @rwestrel Filed this issue. I wonder: what do you think we should do here? How general should the optimization/canonicalization be? Hello @eme64. e1fd025b26e3d54e6455f63577a4649986864ffc updated the allow swapping LHS and RHS. It also added more test patterns with IR verification on cases that should (and should not) pass. I've left comments indicating some patterns could be potentially optimized in the future. I don't see conflict with main branch. Thanks for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-3224472795 From kvn at openjdk.org Tue Aug 26 16:19:35 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 26 Aug 2025 16:19:35 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3] In-Reply-To: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> References: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> Message-ID: <4LxygRPwS2LCH7ZJK-H4-Tq9k6T9GmIfSZPFyz0gNoM=.0dd1f18c-01e5-4320-9316-901d477c2ae2@github.com> On Mon, 25 Aug 2025 14:17:14 GMT, Boris Ulasevich wrote: >> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. >> >> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. >> >> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. >> >> The problems is that shift count `n` may be too large here: >> >> class Pipeline_Use_Cycle_Mask { >> protected: >> uint _mask; >> .. >> Pipeline_Use_Cycle_Mask& operator<<=(int n) { >> _mask <<= n; >> return *this; >> } >> }; >> >> The recent change attempted to cap the shift amount at one call site: >> >> class Pipeline_Use_Element { >> protected: >> .. >> // Mask of specific used cycles >> Pipeline_Use_Cycle_Mask _mask; >> .. >> void step(uint cycles) { >> _used = 0; >> uint max_shift = 8 * sizeof(_mask) - 1; >> _mask <<= (cycles < max_shift) ? cycles : max_shift; >> } >> } >> >> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: >> >> // The following two routines assume that the root Pipeline_Use entity >> // consists of exactly 1 element for each functional unit >> // start is relative to the current cycle; used for latency-based info >> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { >> for (uint i = 0; i < pred._count; i++) { >> const Pipeline_Use_Element *predUse = pred.element(i); >> if (predUse->_multiple) { >> uint min_delay = 7; >> // Multiple possible functional units, choose first unused one >> for (uint j = predUse->_lb; j <= predUse->_ub; j++) { >> const Pipeline_Use_Element *currUse = element(j); >> uint curr_delay = delay; >> if (predUse->_used & currUse->_used) { >> Pipeline_Use_Cycle_Mask x = predUse->_mask; >> Pipeline_Use_Cycle_Mask y = currUse->_mask; >> >> for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) >> y <<= 1; >> } >> if (min_delay > curr_delay) >> min_delay = curr_delay; >> } >> if (delay < min_delay) >> delay = min_delay; >> } >> else { >> for (uint j = predUse->_lb; j <= pre... > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > use uint32_t for _mask My testing passed for version V02. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3224868213 From rehn at openjdk.org Tue Aug 26 16:27:00 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 26 Aug 2025 16:27:00 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v43] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 23:35:45 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer A side comment, which I don't find it discussed in JEP or in the issues. (maybe I just missed it) There can also be a significant performance improvement using direct jumps versus using indirect jump and reduced memory pressure. E.g. a direct BL vs BL to LDR + BR + <8 byte address>. Hence it would be good to place hot methods within the hot area in "call sequences", as an application may have many hot methods totally unrelated to each other. This also means you really would like to have e.g. vtable stub in reach of BL in above case to get the most out of it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3223500970 From vlivanov at openjdk.org Tue Aug 26 17:55:45 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 26 Aug 2025 17:55:45 GMT Subject: RFR: 8358751: C2: Recursive inlining check for compiled lambda forms is broken Message-ID: Recursive inlining checks are relaxed for compiled LambdaForms. Since LambdaForms are heavily reused, the check is performed on `MethodHandle` receivers instead. Unfortunately, the current implementation is broken. JVMState doesn't guarantee presence of receivers for caller frames. An attempt to fetch pruned receiver reports unrelated info, but, in the worst case, it ends up as an out-of-bounds access into node's input array and crashes the JVM. Proposed fix captures receiver information as part of inlining and preserves it on `JVMState` for every compiled LambdaForm frame, so it can be reliably recovered during subsequent inlining attempts. Testing: hs-tier1 - hs-tier8 (Special thanks to @mroth23 who prepared a reproducer of the bug.) ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/26891/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26891&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358751 Stats: 76 lines in 4 files changed: 42 ins; 1 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/26891.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26891/head:pull/26891 PR: https://git.openjdk.org/jdk/pull/26891 From jbhateja at openjdk.org Tue Aug 26 18:31:39 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 26 Aug 2025 18:31:39 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: <0q89hNzCz5i1sosjpLNbxNZE5uyBYGmRZQ5c2c78bl0=.8895c287-d548-4d83-b414-5459bd4826f9@github.com> Message-ID: On Tue, 26 Aug 2025 13:27:54 GMT, Bhavana Kilambi wrote: >> Thanks for sharing. This looks interesting. > > Thanks a lot for the suggestion but I couldn't find any extra benefit of using `@Stable` for a constant variable. I feel it's more useful in case of arrays (maybe I could use an array of the two final fields in the testcase but I felt the current version is more understandable). I will keep this in mind going forward but just to not complicate a simple testcase, I have just kept the `final` keyword for the constants instead. Could you please review the new patch? Looks resonable to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2301798934 From jbhateja at openjdk.org Tue Aug 26 18:56:39 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 26 Aug 2025 18:56:39 GMT Subject: RFR: 8364305: Support AVX10 saturating floating point conversion instructions In-Reply-To: References: Message-ID: On Mon, 25 Aug 2025 05:20:23 GMT, Mohamed Issa wrote: > Intel® AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity. > > Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist ers to store intermediate results. > > This change uses the new AVX10.2 scalar (VCVTTSS2SIS or VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11). > > 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` > 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` > 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` > 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` > 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` > 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` > 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` > 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` > > [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-int... src/hotspot/cpu/x86/assembler_x86.cpp line 2406: > 2404: } > 2405: > 2406: void Assembler::evcvttpd2qqs(XMMRegister dst, XMMRegister src, int vector_len) { Please also add memory operand flavour of these assembler routines. src/hotspot/cpu/x86/x86.ad line 7776: > 7774: %} > 7775: > 7776: instruct cast2DtoX_reg_evex(vec dst, vec src, rFlagsReg cr) %{ Vector instruction do not effect EFLAGS register. src/hotspot/cpu/x86/x86.ad line 7776: > 7774: %} > 7775: > 7776: instruct cast2DtoX_reg_evex(vec dst, vec src, rFlagsReg cr) %{ How about adding a CICS flavour of these patterns. As now we have a single instruction to cover entire conversion semantics memory operand patterns will be useful. src/hotspot/cpu/x86/x86.ad line 7780: > 7778: is_integral_type(Matcher::vector_element_basic_type(n))); > 7779: match(Set dst (VectorCastD2X src)); > 7780: effect(KILL cr); Remove effect. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2301851424 PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2301845691 PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2301848959 PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2301846054 From dlong at openjdk.org Tue Aug 26 19:14:35 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 26 Aug 2025 19:14:35 GMT Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3] In-Reply-To: References: <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com> Message-ID: On Tue, 26 Aug 2025 10:51:43 GMT, Andrew Dinn wrote: > If we are to change anything here then I think we need a review of the accuracy of pipeline models and their current or potential value before doing so. That's a good point. While looking into this, I discovered that the initial masks generated by pipeline_res_mask_initializer() appear wrong. For example, the mask for stage 0 with 1 cycle is computed as 0x80000001, not the 0x1 that I would expect. Stage 2 with 1 cycle is 0x2, not 0x4, etc. I guess if all the masks are wrong in the same way, the problems might mostly cancel out, but it does shed doubt on the usefulness of this code. We could preserve the large latencies for now, and let them trigger the _maxcycleused > 32 code for more platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3225410594 From iveresov at openjdk.org Tue Aug 26 22:59:54 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 26 Aug 2025 22:59:54 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v8] In-Reply-To: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: > This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Relax verification invariant ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26866/files - new: https://git.openjdk.org/jdk/pull/26866/files/c33d94bc..3b6dc806 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26866&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26866&range=06-07 Stats: 32 lines in 4 files changed: 1 ins; 26 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/26866.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26866/head:pull/26866 PR: https://git.openjdk.org/jdk/pull/26866 From iveresov at openjdk.org Tue Aug 26 23:00:44 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 26 Aug 2025 23:00:44 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v6] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> <7zawxaIMLdnM5VraQwvZL3wcj3v8vYtzEvJpWYwQLqg=.eecc2aa6-f47e-44b8-842b-10621e83c2ae@github.com> Message-ID: On Fri, 22 Aug 2025 22:35:08 GMT, Vladimir Kozlov wrote: >> Yes, it runs in a dedicated thread. It doesn't need to terminate. > > Add comment about this. I removed this with the last update, so it's not necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2302328711 From iveresov at openjdk.org Tue Aug 26 23:00:44 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 26 Aug 2025 23:00:44 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v7] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: On Fri, 22 Aug 2025 20:29:10 GMT, Igor Veresov wrote: >> This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > One more nit I decided to relax the verification invariant a bit since it's very hard to ensure that the only thread running is the thread doing the vm exit. Hopefully that's the last change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26866#issuecomment-3225988539 From dlong at openjdk.org Tue Aug 26 23:23:40 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 26 Aug 2025 23:23:40 GMT Subject: RFR: 8358751: C2: Recursive inlining check for compiled lambda forms is broken In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 01:24:52 GMT, Vladimir Ivanov wrote: > Recursive inlining checks are relaxed for compiled LambdaForms. Since LambdaForms are heavily reused, the check is performed on `MethodHandle` receivers instead. > > Unfortunately, the current implementation is broken. JVMState doesn't guarantee presence of receivers for caller frames. > An attempt to fetch pruned receiver reports unrelated info, but, in the worst case, it ends up as an out-of-bounds access into node's input array and crashes the JVM. > > Proposed fix captures receiver information as part of inlining and preserves it on `JVMState` for every compiled LambdaForm frame, so it can be reliably recovered during subsequent inlining attempts. > > Testing: hs-tier1 - hs-tier8 > > (Special thanks to @mroth23 who prepared a reproducer of the bug.) Can you explain why a MH invoker needs to be handled as a special case? Also, it seems like we should be saving the receiver info as a snapshot of arg0/local0 in the callee JVMState, rather than changing it in the caller JVMState for every call site. Don't we already save the receiver somewhere, so that late inlining works correctly? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26891#issuecomment-3226046238 From sviswanathan at openjdk.org Tue Aug 26 23:39:43 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 26 Aug 2025 23:39:43 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions In-Reply-To: References: Message-ID: On Mon, 14 Jul 2025 02:36:24 GMT, Jatin Bhateja wrote: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin @jatin-bhateja Thanks for looking into this. src/hotspot/share/opto/chaitin.cpp line 1461: > 1459: OptoReg::Name PhaseChaitin::select_bias_lrg_color(LRG& lrg, uint bias_lrg, int chunk) { > 1460: if (bias_lrg != 0) { > 1461: // If first bias lrg has a color. There is no first or second here, I think you meant the comment to be: // If bias lrg has a color src/hotspot/share/opto/chaitin.cpp line 1655: > 1653: }; > 1654: > 1655: if (X86_ONLY(UseAPX) NOT_X86(false)) { The change looks to be generically applicable and not APX or X86 specific. ------------- PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3157634419 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2302382839 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2302389647 From duke at openjdk.org Wed Aug 27 01:40:18 2025 From: duke at openjdk.org (erifan) Date: Wed, 27 Aug 2025 01:40:18 GMT Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative floats Message-ID: The?sve_cpy?instruction is not correctly implemented for?negative floating-point?values. The issues include: 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast(pack(d))`?check fails. For example, assume?`d = -1.0`: - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`. - `checked_cast(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`. - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`. - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails. 2. Additionally, the encoding of the negative floating-point number is incorrect: - The imm8?field can fall outside the valid range of?**[-128, 127]**. - Bit **13** should be encoded as **0** for floating-point numbers. This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`. Some test cases are added to aarch64-asmtest.py, and all tests passed. ------------- Commit messages: - 8365911: AArch64: Fix encoding error in sve_cpy for negative floats Changes: https://git.openjdk.org/jdk/pull/26951/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26951&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365911 Stats: 146 lines in 4 files changed: 10 ins; 6 del; 130 mod Patch: https://git.openjdk.org/jdk/pull/26951.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26951/head:pull/26951 PR: https://git.openjdk.org/jdk/pull/26951 From dzhang at openjdk.org Wed Aug 27 02:07:50 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 27 Aug 2025 02:07:50 GMT Subject: RFR: 8366127: RISC-V: compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running without RVV In-Reply-To: <02s9vetJYJkfb2e6CyDOGSJcYvIcJhYAL5DxpUUCnV0=.5100e1a2-e6df-4bdc-ba9a-d3f884fd4470@github.com> References: <02s9vetJYJkfb2e6CyDOGSJcYvIcJhYAL5DxpUUCnV0=.5100e1a2-e6df-4bdc-ba9a-d3f884fd4470@github.com> Message-ID: On Tue, 26 Aug 2025 03:19:09 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > We noticed that compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running on sg2042. > The error is caused by the intrinsic `EncodeISOArray` corresponding to `encodeAsciiArray0` requiring RVV on riscv. (See `encode_iso_array_v` in `c2_MacroAssembler_riscv.cpp`) > > ### Test (fastdebug) > - [x] Run compiler/intrinsics/TestVerifyIntrinsicChecks.java on k1 and sg2042 Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26935#issuecomment-3226492318 From duke at openjdk.org Wed Aug 27 02:07:50 2025 From: duke at openjdk.org (duke) Date: Wed, 27 Aug 2025 02:07:50 GMT Subject: RFR: 8366127: RISC-V: compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running without RVV In-Reply-To: <02s9vetJYJkfb2e6CyDOGSJcYvIcJhYAL5DxpUUCnV0=.5100e1a2-e6df-4bdc-ba9a-d3f884fd4470@github.com> References: <02s9vetJYJkfb2e6CyDOGSJcYvIcJhYAL5DxpUUCnV0=.5100e1a2-e6df-4bdc-ba9a-d3f884fd4470@github.com> Message-ID: On Tue, 26 Aug 2025 03:19:09 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > We noticed that compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running on sg2042. > The error is caused by the intrinsic `EncodeISOArray` corresponding to `encodeAsciiArray0` requiring RVV on riscv. (See `encode_iso_array_v` in `c2_MacroAssembler_riscv.cpp`) > > ### Test (fastdebug) > - [x] Run compiler/intrinsics/TestVerifyIntrinsicChecks.java on k1 and sg2042 @DingliZhang Your change (at version 24c829f12ecf2fd9321cd7e18fb9046754ab510b) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26935#issuecomment-3226493922 From dzhang at openjdk.org Wed Aug 27 02:17:47 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 27 Aug 2025 02:17:47 GMT Subject: Integrated: 8366127: RISC-V: compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running without RVV In-Reply-To: <02s9vetJYJkfb2e6CyDOGSJcYvIcJhYAL5DxpUUCnV0=.5100e1a2-e6df-4bdc-ba9a-d3f884fd4470@github.com> References: <02s9vetJYJkfb2e6CyDOGSJcYvIcJhYAL5DxpUUCnV0=.5100e1a2-e6df-4bdc-ba9a-d3f884fd4470@github.com> Message-ID: On Tue, 26 Aug 2025 03:19:09 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > We noticed that compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running on sg2042. > The error is caused by the intrinsic `EncodeISOArray` corresponding to `encodeAsciiArray0` requiring RVV on riscv. (See `encode_iso_array_v` in `c2_MacroAssembler_riscv.cpp`) > > ### Test (fastdebug) > - [x] Run compiler/intrinsics/TestVerifyIntrinsicChecks.java on k1 and sg2042 This pull request has now been integrated. Changeset: 0d543293 Author: Dingli Zhang Committer: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/0d543293045d0037791774a1414ef279a1f6768b Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8366127: RISC-V: compiler/intrinsics/TestVerifyIntrinsicChecks.java fails when running without RVV Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/26935 From xgong at openjdk.org Wed Aug 27 03:21:47 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 27 Aug 2025 03:21:47 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v5] In-Reply-To: References: Message-ID: On Mon, 4 Aug 2025 02:31:08 GMT, Xiaohong Gong wrote: >> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform. >> >> ### Background >> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register. >> >> ### Implementation >> >> #### Challenges >> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints. >> >> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches: >> - SPECIES_64: Single operation with mask (8 elements, 256-bit) >> - SPECIES_128: Single operation, full register (16 elements, 512-bit) >> - SPECIES_256: Two operations + merge (32 elements, 1024-bit) >> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit) >> >> Use `ByteVector.SPECIES_512` as an example: >> - It contains 64 elements. So the index vector size should be `64 * 32` bits, which is 4 times of the SVE vector register size. >> - It requires 4 times of vector gather-loads to finish the whole operation. >> >> >> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...] >> int[] idx = [0, 1, 2, 3, ..., 63, ...] >> >> 4 gather-load: >> idx_v1 = [15 14 13 ... 1 0] gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa] >> idx_v2 = [31 30 29 ... 17 16] gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb] >> idx_v3 = [47 46 45 ... 33 32] gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc] >> idx_v4 = [63 62 61 ... 49 48] gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd] >> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa] >> >> >> #### Solution >> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end. >> >> Here is the main changes: >> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher. >> - Added `VectorSliceNode` for result mer... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge 'jdk:master' into JDK-8351623-sve > - Address review comments > - Refine IR pattern and clean backend rules > - Fix indentation issue and move the helper matcher method to header files > - Merge branch jdk:master into JDK-8351623-sve > - 8351623: VectorAPI: Add SVE implementation of subword gather load operation Hi, could anyone please help take a look at this PR? Thanks a lot in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3226604847 From epeter at openjdk.org Wed Aug 27 06:42:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 27 Aug 2025 06:42:56 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v22] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Fri, 22 Aug 2025 16:18:17 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add test for related report for JDK-8365982 > > This looks like "rabbit hole" :( > > May be file a separate RFE to investigate this behavior later by some other engineer. Most concerning is that reproduced on different platforms. > > I agree that we may accept this regression since it happened in corner case. I assume our benchmarks are not affected by this. Right? @vnkozlov The internal benchmark testing is complete, there are no significant regressions, and there are some improvements, see the last message. I also ran the TestAliasingFuzzer.java a few extra times, to ensure it does not fail on integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3226951021 From epeter at openjdk.org Wed Aug 27 06:47:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 27 Aug 2025 06:47:54 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v22] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Fri, 22 Aug 2025 16:18:17 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add test for related report for JDK-8365982 > > This looks like "rabbit hole" :( > > May be file a separate RFE to investigate this behavior later by some other engineer. Most concerning is that reproduced on different platforms. > > I agree that we may accept this regression since it happened in corner case. I assume our benchmarks are not affected by this. Right? @vnkozlov You I think the patch is now stable, and you can review again. For the edge-case regressions [here](https://github.com/openjdk/jdk/pull/24278#issuecomment-3213393035): should I file a bug or RFE? At least that way we already are tracking it, and can say we are aware of it if it should ever come up. And we can also provide some possible work-arounds. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3226964447 From jsjolen at openjdk.org Wed Aug 27 07:58:51 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 27 Aug 2025 07:58:51 GMT Subject: RFR: 8365256: RelocIterator should use indexes instead of pointers [v3] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 09:10:16 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. >> >> Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. >> >> Testing: GHA > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Make constructor private Thank you for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/26569#issuecomment-3227184307 From jsjolen at openjdk.org Wed Aug 27 07:58:52 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 27 Aug 2025 07:58:52 GMT Subject: Integrated: 8365256: RelocIterator should use indexes instead of pointers In-Reply-To: References: Message-ID: <34FJmtODq8wr-nPwHl3AWhhAjCEskscFwtW_fmGeXUo=.f6e0bc33-6d2b-4785-be24-82cdbd7e4607@github.com> On Thu, 31 Jul 2025 06:17:24 GMT, Johan Sj?len wrote: > Hi, > > This PR replaces the `current` and `end` pointers with a `base` pointer alongside a `current` index and a `len`. This allows us to have `-1` as the initial value for current, while retaining `nullptr` as the 'dead' value for `_mutable_data`. > > Performance testing shows no difference/performance improvements on DaCapo Linux x64. I don't think that these are actual improvements, but at least there are no clear regressions. > > Testing: GHA This pull request has now been integrated. Changeset: 88c39793 Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/88c39793670f2d36490530993feb60e138f43a70 Stats: 88 lines in 4 files changed: 19 ins; 24 del; 45 mod 8365256: RelocIterator should use indexes instead of pointers Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/26569 From manc at openjdk.org Wed Aug 27 08:46:06 2025 From: manc at openjdk.org (Man Cao) Date: Wed, 27 Aug 2025 08:46:06 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation [v2] In-Reply-To: References: Message-ID: > Hi, > > Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause. > > I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation. > > -Man Man Cao has updated the pull request incrementally with one additional commit since the last revision: Add a jtreg test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26932/files - new: https://git.openjdk.org/jdk/pull/26932/files/75067a49..6048e04e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26932&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26932&range=00-01 Stats: 121 lines in 1 file changed: 121 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26932.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26932/head:pull/26932 PR: https://git.openjdk.org/jdk/pull/26932 From manc at openjdk.org Wed Aug 27 09:01:28 2025 From: manc at openjdk.org (Man Cao) Date: Wed, 27 Aug 2025 09:01:28 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation [v3] In-Reply-To: References: Message-ID: > Hi, > > Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause. > > I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation. > > -Man Man Cao has updated the pull request incrementally with one additional commit since the last revision: Use List.of in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26932/files - new: https://git.openjdk.org/jdk/pull/26932/files/6048e04e..12cd9c29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26932&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26932&range=01-02 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/26932.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26932/head:pull/26932 PR: https://git.openjdk.org/jdk/pull/26932 From epeter at openjdk.org Wed Aug 27 09:12:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 27 Aug 2025 09:12:58 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v23] In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 14:31:24 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Add clarifying comments at definitions of register mask sizes > For reference, here is now the changeset adding an IFG bailout: #26118 Since that is now integrated: do we need to make any changes to the patch here? I thought the goal was to use the bailouts instead of increasing `MaxNodeLimit`. Because looking at the discussions above: we were worried that there could be compile-time regressions - even if quite rare. But they were in the range of 40s which is quite scary. Are these now gone? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3227403634 From epeter at openjdk.org Wed Aug 27 09:41:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 27 Aug 2025 09:41:46 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: <0RKz6D0V5DMA_HAnDKHVOFt-JDxWOBCTu4TTG29MfmI=.e2599d8f-74e2-4a76-9f75-38a6cba2f5ca@github.com> References: <0RKz6D0V5DMA_HAnDKHVOFt-JDxWOBCTu4TTG29MfmI=.e2599d8f-74e2-4a76-9f75-38a6cba2f5ca@github.com> Message-ID: On Tue, 26 Aug 2025 13:04:04 GMT, Bhavana Kilambi wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > Hi @eme64 Can you please review the new patch? Thanks! @Bhavana-Kilambi I scanned through the code and I can rubber-stamp it given others already approved. I'm running some internal testing now... ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3227496181 From epeter at openjdk.org Wed Aug 27 09:51:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 27 Aug 2025 09:51:52 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v9] In-Reply-To: References: Message-ID: <-vujofKVP9wtFfy_oJYaraU_TwzYR3cIIivB6Zvy3Rc=.1fee3b31-9162-416c-8d93-0acc043bf95b@github.com> On Tue, 26 Aug 2025 13:23:57 GMT, Manuel H?ssig wrote: >> This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. >> >> Testing: >> - [x] Github Actions >> - [x] tier1,tier2 plus some internal testing on Oracle supported platforms > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Review Thanks for all the updates! It's going to make the tests just a little nicer ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26762#pullrequestreview-3159116771 From epeter at openjdk.org Wed Aug 27 09:58:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 27 Aug 2025 09:58:45 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v4] In-Reply-To: References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> Message-ID: <0VA9QnuPSb55PbioO1XWtSmrAC-sQet0hb_ldRgKdFQ=.95f56a0b-3b08-4654-8f1e-7217cd9bcabe@github.com> On Mon, 25 Aug 2025 07:10:26 GMT, Galder Zamarre?o wrote: >> Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: >> >> - Add more IR node positive assertions >> - Fix source of data for benchmarks >> - Refactor benchmarks to TypeVectorOperations > > Merged and pushed latest master changes, all looks good still @galderz I got a failure in out testing: With VM flag: `-XX:UseAVX=1`. Failed IR Rules (2) of Methods (2) ---------------------------------- 1) Method "static java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.test6(int[],float[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true", "rvv", "true"}, counts={"_#V#LOAD_VECTOR_F#_", "> 0", "_#STORE_VECTOR#_", "> 0", "_#VECTOR_REINTERPRET#_", "> 0"}, applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 2) Method "static java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.test9(long[],double[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true", "rvv", "true"}, counts={"_#V#LOAD_VECTOR_D#_", "> 0", "_#STORE_VECTOR#_", "> 0", "_#VECTOR_REINTERPRET#_", "> 0"}, applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! I suspect that `test6` with `floatToRawIntBits` and `test9` with `doubleToRawLongBits` are only supported with `AVX2`. Question is if that is really supposed to be like that, or if we should even file an RFE to extend support for `AVX1` and lower. Can you find out why we don't vectorize with `AVX1` here? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3227556451 From dlunden at openjdk.org Wed Aug 27 10:05:00 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 27 Aug 2025 10:05:00 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v23] In-Reply-To: References: Message-ID: <98ilBSNBxoNkz-ncLYUdpoV1y7_FnYvq_lQi9b53CIc=.147584fa-258c-4e0e-bc2f-65b69f631e1f@github.com> On Wed, 27 Aug 2025 09:08:09 GMT, Emanuel Peter wrote: > Since that is now integrated: do we need to make any changes to the patch here? I thought the goal was to use the bailouts instead of increasing `MaxNodeLimit`. > > Because looking at the discussions above: we were worried that there could be compile-time regressions - even if quite rare. But they were in the range of 40s which is quite scary. Are these now gone? Yes, we can just reset all the `java/lang/invoke` tests (i.e., not modify them at all in this changeset) and the bailouts will ensure we do not try to compile the degenerate cases even without specifying `MaxNodeLimit`. I did check this and thought I had pushed the test reset change, but apparently not. I'll double check and then push the change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3227572806 From mli at openjdk.org Wed Aug 27 10:17:48 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Aug 2025 10:17:48 GMT Subject: RFR: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way [v3] In-Reply-To: References: Message-ID: <5ZL1py7DIH_GawDSKmVndRqLNZxG2HVCC9mnVy8vHAk=.12bdffd7-fa19-47c8-a83d-c984526831bd@github.com> On Tue, 26 Aug 2025 09:57:44 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> This is a follow-up of https://github.com/openjdk/jdk/pull/26838, fixes the vector version in a similar way. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Thank you for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26883#issuecomment-3227608592 From mli at openjdk.org Wed Aug 27 10:17:49 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Aug 2025 10:17:49 GMT Subject: Integrated: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 13:17:13 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > This is a follow-up of https://github.com/openjdk/jdk/pull/26838, fixes the vector version in a similar way. > > Thanks! This pull request has now been integrated. Changeset: 32df2d17 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/32df2d17f3c0407ad7e90eacfdc0fd7a65f67551 Stats: 66 lines in 3 files changed: 46 ins; 0 del; 20 mod 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/26883 From epeter at openjdk.org Wed Aug 27 10:25:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 27 Aug 2025 10:25:57 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v14] In-Reply-To: References: Message-ID: On Thu, 31 Jul 2025 03:37:47 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Update tests, cleanup logic > - Merge branch 'master' into vectorize-subword > - Check for AVX2 for byte/long conversions > - Whitespace and benchmark tweak > - Address more comments, make test and benchmark more exhaustive > - Merge from master > - Fix copyright after merge > - Fix copyright > - Merge > - Implement patch with VectorCastNode::implemented > - ... and 6 more: https://git.openjdk.org/jdk/compare/8fcbb110...aabaafba I have a few more comments. This is really exciting that these cases could soon work! Thanks for working on it ? src/hotspot/share/opto/superword.cpp line 2422: > 2420: // Opcode is only required to disambiguate half float, so we pass -1 as it can't be encountered here. > 2421: return (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack_size, def_bt, use_bt); > 2422: } Not sure if we discussed this before: should we not move this to `VectorCastNode`, rather than having it in `SuperWord`? src/hotspot/share/opto/superwordVTransformBuilder.cpp line 197: > 195: > 196: // If the use and def types are different, emit a cast node > 197: if (use_bt != def_bt && !p0->is_Convert() && SuperWord::is_supported_subword_cast(def_bt, use_bt, pack->size())) { Is `SuperWord::is_supported_subword_cast(def_bt, use_bt, pack->size())` really a true condition that you need to check here (and if false we can continue in the "else"), or should it be rather an assert? test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java line 513: > 511: @Test > 512: @IR(applyIfCPUFeature = { "avx", "true" }, > 513: applyIfOr = {"AlignVector", "false", "UseCompactObjectHeaders", "false"}, Do you think these would be supported with `asimd` as well? If you just cannot test with it feel free to file an RFE and then I can find someone to take care of it (e.g. as a starter bug). test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 76: > 74: > 75: @Test > 76: @IR(applyIfCPUFeature = { "avx2", "true" }, counts = { IRNode.VECTOR_CAST_I2S, IRNode.VECTOR_SIZE_ANY, ">0" }) Do you think we can make the vector size more precise here? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23413#pullrequestreview-3159195606 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2303500398 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2303503806 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2303508579 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2303511623 From galder at openjdk.org Wed Aug 27 11:27:45 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 27 Aug 2025 11:27:45 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v4] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 07:49:31 GMT, Daniel Skantz wrote: >> test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsMany.java line 28: >> >>> 26: * @bug 8357105 >>> 27: * @summary Test that repeated stacked string concatenations do not >>> 28: * consume too many compilation resources. >> >> Is there a reasonable way to enhance the test to validate excessive resources? I'm not sure if the following example would work, but I'm wondering if there is something that can be measured deterministically. E.g. before with the given test there would be ~N IR nodes produced but now it would be a max of ~M, assuming that M is deterministically smaller than N. > > What do you think, @galderz ? Thanks! Sorry I had forgotten to ? before, all good with your reply ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2303643835 From mhaessig at openjdk.org Wed Aug 27 11:43:04 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 27 Aug 2025 11:43:04 GMT Subject: RFR: 8366225: Linux Alpine (fast)debug build fails after JDK-8365909 In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 11:33:53 GMT, Manuel H?ssig wrote: > The integration of #26882 broke debug builds on Alpine Linux (and probably other distributions using musl libc), because the typedef `sigevent_t`does not exist in musl libc. This PR fixes this by using `struct sigevent` as the type. > > Testing: > - [ ] Github Actions > - [ ] tier1,tier2 Linux aarch64 and x64 fastdebug > - [x] `make test TEST=compiler/arguments/TestCompileTaskTimeout.java` in Alpine Linux docker container > - [ ] tier1,tier2 on Alpine Linux fastdebug @MBaesken, could you please verify this PR on Alpine Linux? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26956#issuecomment-3227837827 From mhaessig at openjdk.org Wed Aug 27 11:43:04 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 27 Aug 2025 11:43:04 GMT Subject: RFR: 8366225: Linux Alpine (fast)debug build fails after JDK-8365909 Message-ID: The integration of #26882 broke debug builds on Alpine Linux (and probably other distributions using musl libc), because the typedef `sigevent_t`does not exist in musl libc. This PR fixes this by using `struct sigevent` as the type. Testing: - [ ] Github Actions - [ ] tier1,tier2 Linux aarch64 and x64 fastdebug - [x] `make test TEST=compiler/arguments/TestCompileTaskTimeout.java` in Alpine Linux docker container - [ ] tier1,tier2 on Alpine Linux fastdebug ------------- Commit messages: - Fix alpine build Changes: https://git.openjdk.org/jdk/pull/26956/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26956&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8366225 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26956.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26956/head:pull/26956 PR: https://git.openjdk.org/jdk/pull/26956 From mli at openjdk.org Wed Aug 27 12:33:27 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Aug 2025 12:33:27 GMT Subject: [jdk25] RFR: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way Message-ID: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way ------------- Commit messages: - Backport 32df2d17f3c0407ad7e90eacfdc0fd7a65f67551 Changes: https://git.openjdk.org/jdk/pull/26959/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26959&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365772 Stats: 66 lines in 3 files changed: 46 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/26959.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26959/head:pull/26959 PR: https://git.openjdk.org/jdk/pull/26959 From mbaesken at openjdk.org Wed Aug 27 12:38:44 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 27 Aug 2025 12:38:44 GMT Subject: RFR: 8366225: Linux Alpine (fast)debug build fails after JDK-8365909 In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 11:33:53 GMT, Manuel H?ssig wrote: > The integration of #26882 broke debug builds on Alpine Linux (and probably other distributions using musl libc), because the typedef `sigevent_t`does not exist in musl libc. This PR fixes this by using `struct sigevent` as the type. > > Testing: > - [ ] Github Actions > - [ ] tier1,tier2 Linux aarch64 and x64 fastdebug > - [x] `make test TEST=compiler/arguments/TestCompileTaskTimeout.java` in Alpine Linux docker container > - [ ] tier1,tier2 on Alpine Linux fastdebug This fixes the Linux Alpine fastdebug build. ------------- Marked as reviewed by mbaesken (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26956#pullrequestreview-3159607284 From thartmann at openjdk.org Wed Aug 27 13:00:43 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 27 Aug 2025 13:00:43 GMT Subject: RFR: 8366225: Linux Alpine (fast)debug build fails after JDK-8365909 In-Reply-To: References: Message-ID: <8l_0xGoMM2d4hZ4mf3whv96hLHf5EYNwwF0RXgJLLZc=.d3fc0f37-e0f5-4546-89ee-3f2dafdd2085@github.com> On Wed, 27 Aug 2025 11:33:53 GMT, Manuel H?ssig wrote: > The integration of #26882 broke debug builds on Alpine Linux (and probably other distributions using musl libc), because the typedef `sigevent_t`does not exist in musl libc. This PR fixes this by using `struct sigevent` as the type. > > Testing: > - [ ] Github Actions > - [x] tier1,tier2 Linux aarch64 and x64 fastdebug > - [x] `make test TEST=compiler/arguments/TestCompileTaskTimeout.java` in Alpine Linux docker container > - [x] tier1,tier2 on Alpine Linux fastdebug Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26956#pullrequestreview-3159684466 From thartmann at openjdk.org Wed Aug 27 13:01:42 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 27 Aug 2025 13:01:42 GMT Subject: [jdk25] RFR: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 12:25:27 GMT, Hamlin Li wrote: > 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way @Hamlin-Li We are already in the release candidate phase for JDK 25, so this should go to JDK 25u instead, see https://openjdk.org/projects/jdk/25/#Schedule ------------- PR Comment: https://git.openjdk.org/jdk/pull/26959#issuecomment-3228113253 From mli at openjdk.org Wed Aug 27 13:07:53 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Aug 2025 13:07:53 GMT Subject: [jdk25] Withdrawn: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 12:25:27 GMT, Hamlin Li wrote: > 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/26959 From mli at openjdk.org Wed Aug 27 13:07:52 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Aug 2025 13:07:52 GMT Subject: [jdk25] RFR: 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way In-Reply-To: References: Message-ID: <9Cw9BWTtlXlYYBC7t3VeHX8kSGpE-8fur9OC7hq4_S0=.d5e14358-3d7e-4a4a-becf-af669067d29c@github.com> On Wed, 27 Aug 2025 12:59:29 GMT, Tobias Hartmann wrote: >> 8365772: RISC-V: correctly prereserve NaN payload when converting from float to float16 in vector way > > @Hamlin-Li We are already in the release candidate phase for JDK 25, so this should go to JDK 25u instead, see https://openjdk.org/projects/jdk/25/#Schedule @TobiHartmann Thank you for reminding! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26959#issuecomment-3228132477 From thartmann at openjdk.org Wed Aug 27 14:51:50 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 27 Aug 2025 14:51:50 GMT Subject: RFR: 8366225: Linux Alpine (fast)debug build fails after JDK-8365909 In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 11:33:53 GMT, Manuel H?ssig wrote: > The integration of #26882 broke debug builds on Alpine Linux (and probably other distributions using musl libc), because the typedef `sigevent_t`does not exist in musl libc. This PR fixes this by using `struct sigevent` as the type. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 Linux aarch64 and x64 fastdebug > - [x] `make test TEST=compiler/arguments/TestCompileTaskTimeout.java` in Alpine Linux docker container > - [x] tier1,tier2 on Alpine Linux fastdebug FTR: I think this is trivial. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26956#issuecomment-3228515512 From mhaessig at openjdk.org Wed Aug 27 14:51:51 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 27 Aug 2025 14:51:51 GMT Subject: RFR: 8366225: Linux Alpine (fast)debug build fails after JDK-8365909 In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 12:35:45 GMT, Matthias Baesken wrote: >> The integration of #26882 broke debug builds on Alpine Linux (and probably other distributions using musl libc), because the typedef `sigevent_t`does not exist in musl libc. This PR fixes this by using `struct sigevent` as the type. >> >> Testing: >> - [x] Github Actions >> - [x] tier1,tier2 Linux aarch64 and x64 fastdebug >> - [x] `make test TEST=compiler/arguments/TestCompileTaskTimeout.java` in Alpine Linux docker container >> - [x] tier1,tier2 on Alpine Linux fastdebug > > This fixes the Linux Alpine fastdebug build. Thank you for your reviews, @MBaesken and @TobiHartmann! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26956#issuecomment-3228519292 From mhaessig at openjdk.org Wed Aug 27 14:51:52 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 27 Aug 2025 14:51:52 GMT Subject: Integrated: 8366225: Linux Alpine (fast)debug build fails after JDK-8365909 In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 11:33:53 GMT, Manuel H?ssig wrote: > The integration of #26882 broke debug builds on Alpine Linux (and probably other distributions using musl libc), because the typedef `sigevent_t`does not exist in musl libc. This PR fixes this by using `struct sigevent` as the type. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 Linux aarch64 and x64 fastdebug > - [x] `make test TEST=compiler/arguments/TestCompileTaskTimeout.java` in Alpine Linux docker container > - [x] tier1,tier2 on Alpine Linux fastdebug This pull request has now been integrated. Changeset: b43c2c66 Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/b43c2c663567e59f8b5c84b1b45536078190605b Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8366225: Linux Alpine (fast)debug build fails after JDK-8365909 Reviewed-by: mbaesken, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/26956 From mhaessig at openjdk.org Wed Aug 27 15:08:06 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 27 Aug 2025 15:08:06 GMT Subject: RFR: 8366222: TestCompileTaskTimeout causes asserts after JDK-8365909 Message-ID: This PR increases the timeout of the positive test case in `compiler/arguments/TestCompileTaskTimeout.java`, because it was too low, such that the test case failed on some systems. The new timeout of 2s should be large enough for all systems. Testing: - [ ] Github Actions - [ ] tier1,tier2 Linux fastdebug x64, aarch64 ------------- Commit messages: - Increase timeout for positive test in TestCompileTaskTimeout Changes: https://git.openjdk.org/jdk/pull/26963/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26963&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8366222 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26963.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26963/head:pull/26963 PR: https://git.openjdk.org/jdk/pull/26963 From mhaessig at openjdk.org Wed Aug 27 15:08:06 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 27 Aug 2025 15:08:06 GMT Subject: RFR: 8366222: TestCompileTaskTimeout causes asserts after JDK-8365909 In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 15:02:18 GMT, Manuel H?ssig wrote: > This PR increases the timeout of the positive test case in `compiler/arguments/TestCompileTaskTimeout.java`, because it was too low, such that the test case failed on some systems. The new timeout of 2s should be large enough for all systems. > > Testing: > - [ ] Github Actions > - [ ] tier1,tier2 Linux fastdebug x64, aarch64 @MBaesken, could you please verify this PR on your side? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26963#issuecomment-3228581441 From vlivanov at openjdk.org Wed Aug 27 15:10:44 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 27 Aug 2025 15:10:44 GMT Subject: RFR: 8358751: C2: Recursive inlining check for compiled lambda forms is broken In-Reply-To: References: Message-ID: On Tue, 26 Aug 2025 23:21:34 GMT, Dean Long wrote: > Can you explain why a MH invoker needs to be handled as a special case? MH invokers don't have a receiver. They are linked to indy/MH.invoke/MH.invokeExact call sites and there's no dispatching happening when they are invoked (compared to other cases when MH.invokeBasic is used). > ... it seems like we should be saving the receiver info as a snapshot of arg0/local0 in the callee JVMState, ... That's what the patch does. There's a number of places where `receiver_info` is simply copied when `JVMState` is cloned, but each compiled lambda form frame in `JVMState` should have its constant receiver (`MethodHandle` instance) recorded as `receiver_info`. (Constant receiver is a pre-requisite for inlining through `MH.invokeBasic()` to happen.) > Don't we already save the receiver somewhere, so that late inlining works correctly? For late inlining the situation is different: corresponding call site keeps the receiver as an argument. It's not the case with ancestor frames in deep inlined call chains. There are no guarantees that receivers from ancestor frames are still alive when a call site is being considered for inlining during parsing. (Effectively dead locals are aggressively pruned from JVMState during parsing.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26891#issuecomment-3228598940 From roland at openjdk.org Wed Aug 27 15:50:42 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 27 Aug 2025 15:50:42 GMT Subject: RFR: 8358751: C2: Recursive inlining check for compiled lambda forms is broken In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 01:24:52 GMT, Vladimir Ivanov wrote: > Recursive inlining checks are relaxed for compiled LambdaForms. Since LambdaForms are heavily reused, the check is performed on `MethodHandle` receivers instead. > > Unfortunately, the current implementation is broken. JVMState doesn't guarantee presence of receivers for caller frames. > An attempt to fetch pruned receiver reports unrelated info, but, in the worst case, it ends up as an out-of-bounds access into node's input array and crashes the JVM. > > Proposed fix captures receiver information as part of inlining and preserves it on `JVMState` for every compiled LambdaForm frame, so it can be reliably recovered during subsequent inlining attempts. > > Testing: hs-tier1 - hs-tier8 > > (Special thanks to @mroth23 who prepared a reproducer of the bug.) src/hotspot/share/opto/bytecodeInfo.cpp line 442: > 440: { > 441: const bool is_compiled_lambda_form = callee_method->is_compiled_lambda_form(); > 442: const bool is_method_handle_invoker = is_compiled_lambda_form && !jvms->method()->is_compiled_lambda_form(); Ignoring the bug you're fixing, is that logic expected to compute the same `inline_level` that the current logic computes? You changed it a bit (iterate from the current frame rather than the caller, the extra test for `is_method_handle_invoker` and the extra test for `lform_caller_recv == nullptr` in the loop that I'm not sure what the answer to that question is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26891#discussion_r2304428938 From kvn at openjdk.org Wed Aug 27 16:08:03 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 27 Aug 2025 16:08:03 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v22] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Fri, 22 Aug 2025 16:18:17 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add test for related report for JDK-8365982 > > This looks like "rabbit hole" :( > > May be file a separate RFE to investigate this behavior later by some other engineer. Most concerning is that reproduced on different platforms. > > I agree that we may accept this regression since it happened in corner case. I assume our benchmarks are not affected by this. Right? > @vnkozlov You I think the patch is now stable, and you can review again. > > For the edge-case regressions [here](https://github.com/openjdk/jdk/pull/24278#issuecomment-3213393035): should I file a bug or RFE? At least that way we already are tracking it, and can say we are aware of it if it should ever come up. And we can also provide some possible work-arounds. I think it should be RFE - to improve this implementation for this corner case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3228808267 From kvn at openjdk.org Wed Aug 27 16:08:02 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 27 Aug 2025 16:08:02 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v23] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Mon, 25 Aug 2025 10:55:52 GMT, Emanuel Peter wrote: >> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. >> >> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: >> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. >> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. >> >> -------------------------- >> >> **Where to start reviewing** >> >> - `src/hotspot/share/opto/mempointer.hpp`: >> - Read the class comment for `MemPointerRawSummand`. >> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. >> >> - `src/hotspot/share/opto/vectorization.cpp`: >> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. >> >> - `src/hotspot/share/opto/vtransform.hpp`: >> - Understand the difference between weak and strong edges. >> >> If you need to see some examples, then look at the tests: >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. >> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. >> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). >> -------------------------- >> >> **Details** >> >> Most fundamentally: >> - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. >> - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. >> - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` >> - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + Conv... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 217 commits: > > - Merge branch 'master' into JDK-8324751-Aliasing-Analysis-RTC > - improve tests a little > - add test for related report for JDK-8365982 > - add test for related report for JDK-8360204 > - add test for related report for JDK-8359688 > - rm IR rule that checks multiversioning, rare cases fail due to RCE > - disable flag if not possible > - more documentation for Vladimir > - improve benchmark > - fix tests after master integration of JDK-8342692 and JDK-8356176 > - ... and 207 more: https://git.openjdk.org/jdk/compare/45726a1f...a36e3f7a Looks good. Thank you for doing additional experiments and testing. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24278#pullrequestreview-3160569782 From vlivanov at openjdk.org Wed Aug 27 16:20:43 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 27 Aug 2025 16:20:43 GMT Subject: RFR: 8358751: C2: Recursive inlining check for compiled lambda forms is broken In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 15:48:34 GMT, Roland Westrelin wrote: >> Recursive inlining checks are relaxed for compiled LambdaForms. Since LambdaForms are heavily reused, the check is performed on `MethodHandle` receivers instead. >> >> Unfortunately, the current implementation is broken. JVMState doesn't guarantee presence of receivers for caller frames. >> An attempt to fetch pruned receiver reports unrelated info, but, in the worst case, it ends up as an out-of-bounds access into node's input array and crashes the JVM. >> >> Proposed fix captures receiver information as part of inlining and preserves it on `JVMState` for every compiled LambdaForm frame, so it can be reliably recovered during subsequent inlining attempts. >> >> Testing: hs-tier1 - hs-tier8 >> >> (Special thanks to @mroth23 who prepared a reproducer of the bug.) > > src/hotspot/share/opto/bytecodeInfo.cpp line 442: > >> 440: { >> 441: const bool is_compiled_lambda_form = callee_method->is_compiled_lambda_form(); >> 442: const bool is_method_handle_invoker = is_compiled_lambda_form && !jvms->method()->is_compiled_lambda_form(); > > Ignoring the bug you're fixing, is that logic expected to compute the same `inline_level` that the current logic computes? You changed it a bit (iterate from the current frame rather than the caller, the extra test for `is_method_handle_invoker` and the extra test for `lform_caller_recv == nullptr` in the loop that I'm not sure what the answer to that question is. I don't see a compelling reason to treat immediate caller specially here. Since current logic is broken, I decided to make the unification along with the fix. I can separate it into an RFE, if you have any concerns. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26891#discussion_r2304545357 From iveresov at openjdk.org Wed Aug 27 16:29:45 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 27 Aug 2025 16:29:45 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation [v3] In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 09:01:28 GMT, Man Cao wrote: >> Hi, >> >> Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause. >> >> I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation. >> >> -Man > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use List.of in test Seems reasonable ------------- Marked as reviewed by iveresov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26932#pullrequestreview-3160685337 From iveresov at openjdk.org Wed Aug 27 17:10:12 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 27 Aug 2025 17:10:12 GMT Subject: RFR: 8365726: Test crashed with assert in C1 thread: Possible safepoint reached by thread that does not allow it Message-ID: `TrainingData_lock` guards a non-thread safe container and is only locked for a short time. Allow it to skip the safepoint check. ------------- Commit messages: - Don't check for safepoint Changes: https://git.openjdk.org/jdk/pull/26964/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26964&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365726 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26964.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26964/head:pull/26964 PR: https://git.openjdk.org/jdk/pull/26964 From iveresov at openjdk.org Wed Aug 27 18:04:43 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 27 Aug 2025 18:04:43 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v8] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: On Tue, 26 Aug 2025 22:59:54 GMT, Igor Veresov wrote: >> This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Relax verification invariant May I please get approvals for this? @vnkozlov @dholmes-ora ------------- PR Comment: https://git.openjdk.org/jdk/pull/26866#issuecomment-3229203388 From duke at openjdk.org Wed Aug 27 18:04:54 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 27 Aug 2025 18:04:54 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v44] In-Reply-To: References: Message-ID: <-1ZEATIUSOf-ArW2v7P5a7YbshB53kb5mVPw9ihkLXA=.8b526e80-0a6d-4d0b-ad31-443c0e0c066a@github.com> > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality > > Additional Testing: > - [x] Linux x64 fastdebug tier 1/2/3/4 > - [x] Linux aarch64 fastdebug tier 1/2/3/4 Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 109 commits: - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Lock nmethod::relocate behind experimental flag - Use CompiledICLocker instead of CompiledIC_lock - Fix spacing - Update NMethod.java with immutable data changes - Rename method to nm - Add assert before freeing immutable data - Reorder is_relocatable checks - ... and 99 more: https://git.openjdk.org/jdk/compare/bd4c0f4a...668eb4ae ------------- Changes: https://git.openjdk.org/jdk/pull/23573/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=43 Stats: 1678 lines in 28 files changed: 1611 ins; 2 del; 65 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Wed Aug 27 18:04:54 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 27 Aug 2025 18:04:54 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v43] In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 23:35:45 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer Merged master due to merge conflict caused by [JDK-8365256](https://bugs.openjdk.org/browse/JDK-8365256) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3229190200 From kvn at openjdk.org Wed Aug 27 18:12:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 27 Aug 2025 18:12:46 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v8] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: On Tue, 26 Aug 2025 22:59:54 GMT, Igor Veresov wrote: >> This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent it from working reliably. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Relax verification invariant Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26866#pullrequestreview-3161190214 From manc at openjdk.org Wed Aug 27 19:06:45 2025 From: manc at openjdk.org (Man Cao) Date: Wed, 27 Aug 2025 19:06:45 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation [v3] In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 09:01:28 GMT, Man Cao wrote: >> Hi, >> >> Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause. >> >> I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation. >> >> -Man > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use List.of in test Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26932#issuecomment-3229421658 From manc at openjdk.org Wed Aug 27 19:06:46 2025 From: manc at openjdk.org (Man Cao) Date: Wed, 27 Aug 2025 19:06:46 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation [v3] In-Reply-To: References: Message-ID: On Mon, 25 Aug 2025 22:35:47 GMT, Man Cao wrote: >> src/hotspot/share/compiler/compilationPolicy.cpp line 925: >> >>> 923: } >>> 924: >>> 925: if (!CompilationModeFlag::disable_intermediate()) { >> >> AFAICT, the block of code here is intended for handling the case when intermediate is not disabled. Your change subtly alters that. >> >> When `TieredCompilation` is disabled, the large method compilation is done via `CompileBroker::compile_method` if `!CompileBroker::compilation_is_in_queue(mh)` is `true`. I confirmed that in lldb, see below. Is there any reason to not do `can_be_compiled` check when calling `CompileBroker::compile_method`? >> >> Additionally, should `can_be_compiled` check only be done for c2 compilation or if it should also be applied to c1 compilation? >> >> >> (lldb) bt >> * thread #18, name = 'ApexEnumTest_de', stop reason = step in >> * frame #0: 0x00007ffff49b70ae libjvm.so`CompileBroker::compile_method(method=0x00007ffff38dba10, osr_bci=-1, comp_level=4, hot_method=0x00007ffff38dba10, hot_count=6784, compile_reason=Reason_Tiered, __the_thread__=0x00001354ff8c9810) at compileBroker.cpp:1347:21 >> frame #1: 0x00007ffff49981fb libjvm.so`CompilationPolicy::compile(mh=0x00007ffff38dba10, bci=-1, level=CompLevel_full_optimization, __the_thread__=0x00001354ff8c9810) at compilationPolicy.cpp:824:5 >> frame #2: 0x00007ffff4997baf libjvm.so`CompilationPolicy::method_invocation_event(mh=0x00007ffff38dba10, imh=0x00007ffff38dba10, level=CompLevel_none, nm=0x0000000000000000, __the_thread__=0x00001354ff8c9810) at compilationPolicy.cpp:1160:7 >> frame #3: 0x00007ffff49979ea libjvm.so`CompilationPolicy::event(method=0x00007ffff38dba10, inlinee=0x00007ffff38dba10, branch_bci=-1, bci=-1, comp_level=CompLevel_none, nm=0x0000000000000000, __the_thread__=0x00001354ff8c9810) at compilationPolicy.cpp:745:5 >> frame #4: 0x00007ffff4d79dd8 libjvm.so`InterpreterRuntime::frequency_counter_overflow_inner(current=0x00001354ff8c9810, branch_bcp=0x0000000000000000) at interpreterRuntime.cpp:1066:21 >> frame #5: 0x00007ffff4d79a76 libjvm.so`InterpreterRuntime::frequency_counter_overflow(current=0x00001354ff8c9810, branch_bcp=0x0000000000000000) at interpreterRuntime.cpp:1015:17 >> frame #6: 0x00007fffe1c0ce41 >> frame #7: 0x00007fffe1c080a8 >> frame #8: 0x00007fffe1c00d01 >> frame #9: 0x00007ffff4d8786d libjvm.so`JavaCalls::call_helper(result=0x00007ffff38dc040, method=0x00007ffff38dbf90, args=0x00007ffff38dbec8, __the_thread__=0x00001354ff8c9810) at javaCalls.cpp:415:7 >> frame #10... > >> AFAICT, the block of code here is intended for handling the case when intermediate is not disabled. Your change subtly alters that. >> When TieredCompilation is disabled, the large method compilation is done via CompileBroker::compile_method if !CompileBroker::compilation_is_in_queue(mh) is true. I confirmed that in lldb, see below. Is there any reason to not do can_be_compiled check when calling CompileBroker::compile_method? > > Trying to compile the large method under `-XX:-TieredCompilation` is the bug. The large method should not be compiled under `-XX:+DontCompileHugeMethods`. > > The bug is caused by erroneously guarding the `!can_be_compiled()` and `!can_be_osr_compiled()` checks behind `!CompilationModeFlag::disable_intermediate()`. The correct behavior is to do the following checks and returns regardless of `TieredCompilation`: > > if ((bci == InvocationEntryBci && !can_be_compiled(mh, level))) { > return; > } > if ((bci != InvocationEntryBci && !can_be_osr_compiled(mh, level))) { > return; > } > > Only the recursive call to `compile(mh, bci, CompLevel_simple, THREAD)` and `osr_nm->make_not_entrant()` need to be guarded under `!disable_intermediate()`. > > It is possible to add the above two checks for `bci`, `can_be_compiled()` and `!can_be_osr_compiled()` to inside `CompileBroker::compile_method()`, specifically inside `CompileBroker::compilation_is_prohibited()`. If compiler-dev team prefers this way, we could move them. To answer more directly: > Additionally, should can_be_compiled check only be done for c2 compilation or if it should also be applied to c1 compilation? `can_be_compiled()` and `can_be_osr_compiled()` should be applied to both C1 and C2 compilation. > Is there any reason to not do `can_be_compiled` check when calling `CompileBroker::compile_method`? My rationale is to keep the code similar prior to [JDK-8251462](https://bugs.openjdk.org/browse/JDK-8251462). As mentioned above, it is possible to add those checks to `CompileBroker::compile_method()` or `CompileBroker::compilation_is_prohibited()`. I could do that if there's a strong preference. A potential issue with that approach is that the code here in `CompilationPolicy::compile()` is still confusing: why do those two `if (...) { return;}` checks only apply to `!CompilationModeFlag::disable_intermediate()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26932#discussion_r2305060279 From dlong at openjdk.org Wed Aug 27 19:36:42 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 27 Aug 2025 19:36:42 GMT Subject: RFR: 8358751: C2: Recursive inlining check for compiled lambda forms is broken In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 01:24:52 GMT, Vladimir Ivanov wrote: > Recursive inlining checks are relaxed for compiled LambdaForms. Since LambdaForms are heavily reused, the check is performed on `MethodHandle` receivers instead. > > Unfortunately, the current implementation is broken. JVMState doesn't guarantee presence of receivers for caller frames. > An attempt to fetch pruned receiver reports unrelated info, but, in the worst case, it ends up as an out-of-bounds access into node's input array and crashes the JVM. > > Proposed fix captures receiver information as part of inlining and preserves it on `JVMState` for every compiled LambdaForm frame, so it can be reliably recovered during subsequent inlining attempts. > > Testing: hs-tier1 - hs-tier8 > > (Special thanks to @mroth23 who prepared a reproducer of the bug.) Would it be possible to add a test? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26891#issuecomment-3229505476 From duke at openjdk.org Wed Aug 27 19:43:58 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 27 Aug 2025 19:43:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v42] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 17:10:01 GMT, Vladimir Kozlov wrote: >>> By using less itbl entries we can significant increase ipc on these CPUs. >>> >>> Simple testing with some eariler version of this got ~10% reduction in frontend stalls (take that number with a grain of salt). >>> >>> Now if this is correct approach or not, that's is still unclear to me. >> >> Okay that sounds quite promising. >> >> So what is the driver for this relocation in the JVM, which makes sure hot nmethods get moved together? > >> So what is the driver for this relocation in the JVM, which makes sure hot nmethods get moved together? > > @fisk, next RFE [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205) will "drive" nmethod relocation based on their hotness. It is similar AFAIK to what you implemented back in Leyden repo to create list of hot nmethods to cache. > > > We can a sampling thread which uses the thread-local handshake framework. An example of such a thread is the Sweeper: https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/runtime/sweeper.hpp which was used to detect active nmethods. @vnkozlov Could you give the changes another look before integration since I had to resolve a minor merge conflict? I don?t think re-review from others is necessary, but let me know if you feel otherwise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3229523766 From dlong at openjdk.org Wed Aug 27 19:56:41 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 27 Aug 2025 19:56:41 GMT Subject: RFR: 8365726: Test crashed with assert in C1 thread: Possible safepoint reached by thread that does not allow it In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 17:04:27 GMT, Igor Veresov wrote: > `TrainingData_lock` guards a non-thread safe container and is only locked for a short time. Allow it to skip the safepoint check. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26964#pullrequestreview-3161558071 From eosterlund at openjdk.org Wed Aug 27 20:19:46 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 27 Aug 2025 20:19:46 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v5] In-Reply-To: References: Message-ID: <-MqvO74Up2R0qmEDtgyGY-yScxZ-v6ZQWxDtSxpKO_g=.56d4eeca-670d-41e4-9e96-ba20b1b44100@github.com> On Thu, 14 Aug 2025 20:37:51 GMT, Dean Long wrote: > @fisk , can I get you to review this? Sure! Based on the symptoms you described, my main comment is that we might be looking at the wrong places. I don't know if this is really about lock contention. Perhaps it is indirectly. But you mention there is still so e regression with ZGC. My hypothesis would be that it is the unnecessary incrementing of the global patching epoch that causes the regression when using ZGC. It is only really needed when disarming the nmethod - in orher words when the guard value is set to the good value. The point of incrementing the patching epoch is to protect other threads from entering the nmethod without executing an instruction cross modication fence. And all other threads will have to do that. Only ZGC uses the mode of nmethod entry barriers that does this due to being the only GC that updates instructions in a concurrent phase on AArch64. We are conservative on AArch64 and ensure the use of appropriate synchronous cross modifying code. But that's not needed when arming, which is what we do when making the bmethod not entrant. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26399#issuecomment-3229619234 From dlong at openjdk.org Wed Aug 27 23:16:40 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 27 Aug 2025 23:16:40 GMT Subject: RFR: 8358751: C2: Recursive inlining check for compiled lambda forms is broken In-Reply-To: References: Message-ID: <5CWIgRjLyMSwlaou9zNbdSYByGJ9sXUiORD-5_avSWQ=.07ab4ed4-fd22-4620-806e-dfa66a92f61e@github.com> On Fri, 22 Aug 2025 01:24:52 GMT, Vladimir Ivanov wrote: > Recursive inlining checks are relaxed for compiled LambdaForms. Since LambdaForms are heavily reused, the check is performed on `MethodHandle` receivers instead. > > Unfortunately, the current implementation is broken. JVMState doesn't guarantee presence of receivers for caller frames. > An attempt to fetch pruned receiver reports unrelated info, but, in the worst case, it ends up as an out-of-bounds access into node's input array and crashes the JVM. > > Proposed fix captures receiver information as part of inlining and preserves it on `JVMState` for every compiled LambdaForm frame, so it can be reliably recovered during subsequent inlining attempts. > > Testing: hs-tier1 - hs-tier8 > > (Special thanks to @mroth23 who prepared a reproducer of the bug.) Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26891#pullrequestreview-3162095286 From kvn at openjdk.org Wed Aug 27 23:27:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 27 Aug 2025 23:27:01 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v44] In-Reply-To: <-1ZEATIUSOf-ArW2v7P5a7YbshB53kb5mVPw9ihkLXA=.8b526e80-0a6d-4d0b-ad31-443c0e0c066a@github.com> References: <-1ZEATIUSOf-ArW2v7P5a7YbshB53kb5mVPw9ihkLXA=.8b526e80-0a6d-4d0b-ad31-443c0e0c066a@github.com> Message-ID: On Wed, 27 Aug 2025 18:04:54 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 109 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Lock nmethod::relocate behind experimental flag > - Use CompiledICLocker instead of CompiledIC_lock > - Fix spacing > - Update NMethod.java with immutable data changes > - Rename method to nm > - Add assert before freeing immutable data > - Reorder is_relocatable checks > - ... and 99 more: https://git.openjdk.org/jdk/compare/bd4c0f4a...668eb4ae I just noticed (by looking on nmethodrelocation.java last changes) that you placed new testing into `test/hotspot/jtreg/vmTestbase/nsk/jvmti/`. Which is old tests directory. Any reason you placed it there instead of `test/hotspot/jtreg/serviceability/jvmti` ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3230103282 From kvn at openjdk.org Wed Aug 27 23:31:58 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 27 Aug 2025 23:31:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v19] In-Reply-To: References: <17al0aeFhm0iZHoHHGiqB03RfPeSrIHIoZuapOHPuy4=.a2ff2d67-392b-40f0-b6d9-6e3a7f396e8a@github.com> Message-ID: On Tue, 17 Jun 2025 15:09:24 GMT, Evgeny Astigeevich wrote: >>> If it is moved, the [CompiledMethodUnload](https://docs.oracle.com/en/java/javase/24/docs/specs/jvmti.html#CompiledMethodUnload) event is sent, followed by a new CompiledMethodLoad event. >> >>> we now have 2 nmethods alive with the same compile_id which could be confusing. >> >> It's nice that the JVMTI docs considered this problem but the notifications will be sent in the reverse order given our current implementation. We will create a new nmethod while the old nmethod might still be alive, at least for the purposes of deopt. Since this PR doesn't actually perform any relocation, I'm not sure what the plan is here. The most aggressive thing that could be done is to invalidate all frames which have the old nmethod on stack, but that still leaves the nmethod live for the purposes of deopt. It would probably be ok to synthesize an unload after the deopt since there should be no actual execution in those nmethods, but you will then have to suppress the one that's normally done during nmethod::unlink. >> >> I agree that the docs are fairly clear that all of this is ok, but that doesn't mean that assumptions haven't been made about the current implementation. We just need to make sure we do something rational and that it's possible to understand from our output what was done. > >> Since this PR doesn't actually perform any relocation, I'm not sure what the plan is here. > > The plan is to use this functionality in [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205) > >> The most aggressive thing that could be done is to invalidate all frames which have the old nmethod on stack, but that still leaves the nmethod live for the purposes of deopt. It would probably be ok to synthesize an unload after the deopt since there should be no actual execution in those nmethods, but you will then have to suppress the one that's normally done during nmethod::unlink. > > This might have negative performance impact because we will be relocating hot nmethods. IMO it's better to let calls of the original nmethod to finish. New calls will be using the copy. > > It looks like the implementation does not move code in the terms of the JVMTI spec. > The JVMTI spec expects moving code to unload it from memory: >> Compiled Method Unload >> >> Sent when a compiled method is unloaded from memory. > > As we don't want to unload code from memory, we cannot send Compiled Method Unload event. > > I think we can generate just Compiled Method Load event because of the note: >> Note that a single method may have multiple compiled forms, and that this event will be sent for each form. > > Alternatively, we can update the JVMTI spec to say Compiled Method Load event can be a result of code copied. Also, as @eastig pointed, we don't use all low case class names `nmethodrelocation` anymore. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3230112262 From fyang at openjdk.org Thu Aug 28 02:41:51 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 28 Aug 2025 02:41:51 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v3] In-Reply-To: <_2GD6G4L__UBychjUd_afVU4IYhEQWzCqQB-rPe5jkY=.5187f71e-7865-462c-a3d6-6438c224081a@github.com> References: <_2GD6G4L__UBychjUd_afVU4IYhEQWzCqQB-rPe5jkY=.5187f71e-7865-462c-a3d6-6438c224081a@github.com> Message-ID: On Mon, 25 Aug 2025 03:51:07 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > change some name and format Hi, Thanks for making these changes. I am having a look and I have some minor comments along the way. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2613: > 2611: uint64_t maskIndex = 0xaaul; > 2612: __ mv(t0, maskIndex); > 2613: __ vsetvli(x1, x0, Assembler::e8, Assembler::m1); Please note that `x1` is a special register (return address) on riscv64. Why not use `t0` instead? I mean: __ vsetvli(t0, x0, Assembler::e8, Assembler::m1); __ mv(t0, maskIndex); __ vmv_v_x(v0, t0); src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2644: > 2642: > 2643: __ bind(L_encrypt_next); > 2644: __ add(t1, saved_encrypted_ctr, used); Can we avoid use `t1` here since we already have `vl` as its alias? We can declare a `tmp` register and let it alias `c_rarg7`. Like `const Register tmp = c_rarg7;` Then we can use this `tmp` here instead. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2646: > 2644: __ add(t1, saved_encrypted_ctr, used); > 2645: __ lb(t0, Address(t1)); > 2646: __ lb(t1, Address(in)); Should we use `lbu` instead of `lb` here? ------------- PR Review: https://git.openjdk.org/jdk/pull/25281#pullrequestreview-3162849934 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2305980122 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2305990112 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2305989673 From dholmes at openjdk.org Thu Aug 28 04:22:45 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 28 Aug 2025 04:22:45 GMT Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v8] In-Reply-To: References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com> Message-ID: On Wed, 27 Aug 2025 18:10:04 GMT, Vladimir Kozlov wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> Relax verification invariant > > Good. > May I please get approvals for this? @vnkozlov @dholmes-ora Sorry @veresov I have no knowledge of the actual `trainingData` code. You will need someone else familiar with Leyden to approve this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26866#issuecomment-3231811527 From galder at openjdk.org Thu Aug 28 04:33:43 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 28 Aug 2025 04:33:43 GMT Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v4] In-Reply-To: <0VA9QnuPSb55PbioO1XWtSmrAC-sQet0hb_ldRgKdFQ=.95f56a0b-3b08-4654-8f1e-7217cd9bcabe@github.com> References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com> <0VA9QnuPSb55PbioO1XWtSmrAC-sQet0hb_ldRgKdFQ=.95f56a0b-3b08-4654-8f1e-7217cd9bcabe@github.com> Message-ID: On Wed, 27 Aug 2025 09:56:29 GMT, Emanuel Peter wrote: >> Merged and pushed latest master changes, all looks good still > > @galderz I got a failure in out testing: > > With VM flag: `-XX:UseAVX=1`. > > > Failed IR Rules (2) of Methods (2) > ---------------------------------- > 1) Method "static java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.test6(int[],float[])" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true", "rvv", "true"}, counts={"_#V#LOAD_VECTOR_F#_", "> 0", "_#STORE_VECTOR#_", "> 0", "_#VECTOR_REINTERPRET#_", "> 0"}, applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z])" > - Failed comparison: [found] 0 > 0 [given] > - No nodes matched! > > 2) Method "static java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.test9(long[],double[])" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true", "rvv", "true"}, counts={"_#V#LOAD_VECTOR_D#_", "> 0", "_#STORE_VECTOR#_", "> 0", "_#VECTOR_REINTERPRET#_", "> 0"}, applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z])" > - Failed comparison: [found] 0 > 0 [given] > - No nodes matched! > > > I suspect that `test6` with `floatToRawIntBits` and `test9` with `doubleToRawLongBits` are only supported with `AVX2`. Question is if that is really supposed to be like that, or if we should even file an RFE to extend support for `AVX1` and lower. > > Can you find out why we don't vectorize with `AVX1` here? @eme64 I've replicated the failure. Looking into it ------------- PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3231836866 From missa at openjdk.org Thu Aug 28 05:11:58 2025 From: missa at openjdk.org (Mohamed Issa) Date: Thu, 28 Aug 2025 05:11:58 GMT Subject: RFR: 8364305: Support AVX10 saturating floating point conversion instructions [v2] In-Reply-To: References: Message-ID: > Intel® AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity. > > Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist ers to store intermediate results. > > This change uses the new AVX10.2 scalar (VCVTTSS2SIS or VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11). > > 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` > 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` > 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` > 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` > 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` > 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` > 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` > 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` > > [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-int... Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: Add memory variants of the AVX 10.2 floating point conversion instructions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26919/files - new: https://git.openjdk.org/jdk/pull/26919/files/57745ae1..e67e376e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=00-01 Stats: 240 lines in 6 files changed: 210 ins; 8 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/26919.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919 PR: https://git.openjdk.org/jdk/pull/26919 From missa at openjdk.org Thu Aug 28 05:11:58 2025 From: missa at openjdk.org (Mohamed Issa) Date: Thu, 28 Aug 2025 05:11:58 GMT Subject: RFR: 8364305: Support AVX10 saturating floating point conversion instructions [v2] In-Reply-To: References: Message-ID: On Tue, 26 Aug 2025 18:52:22 GMT, Jatin Bhateja wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Add memory variants of the AVX 10.2 floating point conversion instructions > > src/hotspot/cpu/x86/assembler_x86.cpp line 2406: > >> 2404: } >> 2405: >> 2406: void Assembler::evcvttpd2qqs(XMMRegister dst, XMMRegister src, int vector_len) { > > Please also add memory operand flavour of these assembler routines. I added memory variants of the instructions. > src/hotspot/cpu/x86/x86.ad line 7776: > >> 7774: %} >> 7775: >> 7776: instruct cast2DtoX_reg_evex(vec dst, vec src, rFlagsReg cr) %{ > > Vector instruction do not effect EFLAGS register. I removed the EFLAGS register parameters. > src/hotspot/cpu/x86/x86.ad line 7776: > >> 7774: %} >> 7775: >> 7776: instruct cast2DtoX_reg_evex(vec dst, vec src, rFlagsReg cr) %{ > > How about adding a CICS flavour of these patterns. As now we have a single instruction to cover entire conversion semantics memory operand patterns will be useful. I'm not familiar with CICS. Could you elaborate? Also, I added memory variants. > src/hotspot/cpu/x86/x86.ad line 7780: > >> 7778: is_integral_type(Matcher::vector_element_basic_type(n))); >> 7779: match(Set dst (VectorCastD2X src)); >> 7780: effect(KILL cr); > > Remove effect. I removed the effect calls. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2306185634 PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2306183884 PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2306185433 PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2306184151 From epeter at openjdk.org Thu Aug 28 05:56:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Aug 2025 05:56:04 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v22] In-Reply-To: References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Wed, 27 Aug 2025 16:02:27 GMT, Vladimir Kozlov wrote: >> This looks like "rabbit hole" :( >> >> May be file a separate RFE to investigate this behavior later by some other engineer. Most concerning is that reproduced on different platforms. >> >> I agree that we may accept this regression since it happened in corner case. I assume our benchmarks are not affected by this. Right? > >> @vnkozlov You I think the patch is now stable, and you can review again. >> >> For the edge-case regressions [here](https://github.com/openjdk/jdk/pull/24278#issuecomment-3213393035): should I file a bug or RFE? At least that way we already are tracking it, and can say we are aware of it if it should ever come up. And we can also provide some possible work-arounds. > > I think it should be RFE - to improve this implementation for this corner case. @vnkozlov @mhaessig Thank you very much for reviewing. I know this was a huge change set to work through, so I'm very thankful ? @vnkozlov I did some additional high-tier testing. No related failures. (aborted some remaining tests on macosx-aarch64, since it was the only platform that did not finish tests - preserving some resources there) I also filed the follow-up RFE: [JDK-8366274](https://bugs.openjdk.org/browse/JDK-8366274) C2 SuperWord: investigate edge case performance regressions from JDK-8324751 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3231991774 From epeter at openjdk.org Thu Aug 28 05:56:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Aug 2025 05:56:06 GMT Subject: Integrated: 8324751: C2 SuperWord: Aliasing Analysis runtime check In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Thu, 27 Mar 2025 13:00:20 GMT, Emanuel Peter wrote: > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. > - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. > - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` > - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` > - For aliasing analysis (adjacency and overlap), the "regu... This pull request has now been integrated. Changeset: 443b1726 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/443b17263876355ef508ae68ddad6c108de29db8 Stats: 5828 lines in 29 files changed: 5579 ins; 18 del; 231 mod 8324751: C2 SuperWord: Aliasing Analysis runtime check Reviewed-by: kvn, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/24278 From wenanjian at openjdk.org Thu Aug 28 06:14:44 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 28 Aug 2025 06:14:44 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v3] In-Reply-To: References: <_2GD6G4L__UBychjUd_afVU4IYhEQWzCqQB-rPe5jkY=.5187f71e-7865-462c-a3d6-6438c224081a@github.com> Message-ID: On Thu, 28 Aug 2025 02:29:57 GMT, Fei Yang wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> change some name and format > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2613: > >> 2611: uint64_t maskIndex = 0xaaul; >> 2612: __ mv(t0, maskIndex); >> 2613: __ vsetvli(x1, x0, Assembler::e8, Assembler::m1); > > Please note that `x1` is a special register (return address) on riscv64. Why not use `t0` instead? I mean: > > __ vsetvli(t0, x0, Assembler::e8, Assembler::m1); > __ mv(t0, maskIndex); > __ vmv_v_x(v0, t0); Thanks for the review! I think we can use t0 here to avoid x1, done! > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2644: > >> 2642: >> 2643: __ bind(L_encrypt_next); >> 2644: __ add(t1, saved_encrypted_ctr, used); > > Can we avoid use `t1` here since we already have `vl` as its alias? > We can declare a `tmp` register and let it alias `c_rarg7`. Like `const Register tmp = c_rarg7;` > Then we can use this `tmp` here instead. done > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2646: > >> 2644: __ add(t1, saved_encrypted_ctr, used); >> 2645: __ lb(t0, Address(t1)); >> 2646: __ lb(t1, Address(in)); > > Should we use `lbu` instead of `lb` here? yes?I think encrypt a single byte should be unsigned here?we better change it to lbu to avoid some corener error?done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2306285456 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2306285573 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2306285782 From wenanjian at openjdk.org Thu Aug 28 06:21:59 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 28 Aug 2025 06:21:59 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v4] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: update reg use and instruction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/f3698f37..62f1d99e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=02-03 Stats: 7 lines in 1 file changed: 2 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From bmaillard at openjdk.org Thu Aug 28 06:32:56 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 28 Aug 2025 06:32:56 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v9] In-Reply-To: References: Message-ID: On Tue, 26 Aug 2025 13:23:57 GMT, Manuel H?ssig wrote: >> This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. >> >> Testing: >> - [x] Github Actions >> - [x] tier1,tier2 plus some internal testing on Oracle supported platforms > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Review The new changes look good to me! ------------- Marked as reviewed by bmaillard (Author). PR Review: https://git.openjdk.org/jdk/pull/26762#pullrequestreview-3163375725 From mhaessig at openjdk.org Thu Aug 28 06:32:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 28 Aug 2025 06:32:56 GMT Subject: RFR: 8365262: [IR-Framework] Add simple way to add cross-product of flags [v9] In-Reply-To: References: Message-ID: On Thu, 28 Aug 2025 06:28:13 GMT, Beno?t Maillard wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Review > > The new changes look good to me! Thank you for your reviews, @benoitmaillard and @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26762#issuecomment-3232108835 From mhaessig at openjdk.org Thu Aug 28 06:32:58 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 28 Aug 2025 06:32:58 GMT Subject: Integrated: 8365262: [IR-Framework] Add simple way to add cross-product of flags In-Reply-To: References: Message-ID: On Wed, 13 Aug 2025 14:38:01 GMT, Manuel H?ssig wrote: > This PR adds the `TestFramework::addCrossProductScenarios` method to enable more ergonomic testing of the combination of all flag combinations. To illustrate its use, I also converted one test to use the new cross product functionality. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 plus some internal testing on Oracle supported platforms This pull request has now been integrated. Changeset: 57df267e Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/57df267e4269b26f7450309b54c55ddee458f75c Stats: 245 lines in 4 files changed: 233 ins; 8 del; 4 mod 8365262: [IR-Framework] Add simple way to add cross-product of flags Reviewed-by: bmaillard, epeter ------------- PR: https://git.openjdk.org/jdk/pull/26762 From chagedorn at openjdk.org Thu Aug 28 06:48:41 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 28 Aug 2025 06:48:41 GMT Subject: RFR: 8366222: TestCompileTaskTimeout causes asserts after JDK-8365909 In-Reply-To: References: Message-ID: <9r1bcQPeJaIpviibGNLy1zJFmJXGGq7EowHL0-WIuJ8=.1b690287-f869-4dcb-86f5-f2d8e628710e@github.com> On Wed, 27 Aug 2025 15:02:18 GMT, Manuel H?ssig wrote: > This PR increases the timeout of the positive test case in `compiler/arguments/TestCompileTaskTimeout.java`, because it was too low, such that the test case failed on some systems. The new timeout of 2s should be large enough for all systems. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 Linux fastdebug x64, aarch64 Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26963#pullrequestreview-3163446815 From thartmann at openjdk.org Thu Aug 28 07:03:41 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 28 Aug 2025 07:03:41 GMT Subject: RFR: 8366222: TestCompileTaskTimeout causes asserts after JDK-8365909 In-Reply-To: References: Message-ID: <4Xg-nEe1-bDKHZKBjcj_W8TZWDLccoUpWYCietIChEk=.a91881c5-4f88-4e1a-a053-cb85937d83bb@github.com> On Wed, 27 Aug 2025 15:02:18 GMT, Manuel H?ssig wrote: > This PR increases the timeout of the positive test case in `compiler/arguments/TestCompileTaskTimeout.java`, because it was too low, such that the test case failed on some systems. The new timeout of 2s should be large enough for all systems. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 Linux fastdebug x64, aarch64 Looks good to me too but maybe @MBaesken could verify that the test now passes in their CI. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26963#pullrequestreview-3163485844 From mbaesken at openjdk.org Thu Aug 28 07:23:43 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 28 Aug 2025 07:23:43 GMT Subject: RFR: 8366222: TestCompileTaskTimeout causes asserts after JDK-8365909 In-Reply-To: <4Xg-nEe1-bDKHZKBjcj_W8TZWDLccoUpWYCietIChEk=.a91881c5-4f88-4e1a-a053-cb85937d83bb@github.com> References: <4Xg-nEe1-bDKHZKBjcj_W8TZWDLccoUpWYCietIChEk=.a91881c5-4f88-4e1a-a053-cb85937d83bb@github.com> Message-ID: On Thu, 28 Aug 2025 07:00:37 GMT, Tobias Hartmann wrote: > Looks good to me too but maybe @MBaesken could verify that the test now passes in their CI. I add the PR to our build/test queue . ------------- PR Comment: https://git.openjdk.org/jdk/pull/26963#issuecomment-3232257555 From vlivanov at openjdk.org Thu Aug 28 07:30:31 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 28 Aug 2025 07:30:31 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v5] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request incrementally with three additional commits since the last revision: - cleanups - Conditional RF elimination pass - GrowableArray ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25315/files - new: https://git.openjdk.org/jdk/pull/25315/files/1a6af8b8..8b1c6dff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=03-04 Stats: 120 lines in 11 files changed: 54 ins; 16 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From bkilambi at openjdk.org Thu Aug 28 09:45:49 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 28 Aug 2025 09:45:49 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v7] In-Reply-To: References: <0RKz6D0V5DMA_HAnDKHVOFt-JDxWOBCTu4TTG29MfmI=.e2599d8f-74e2-4a76-9f75-38a6cba2f5ca@github.com> Message-ID: On Wed, 27 Aug 2025 09:38:53 GMT, Emanuel Peter wrote: >> Hi @eme64 Can you please review the new patch? Thanks! > > @Bhavana-Kilambi I scanned through the code and I can rubber-stamp it given others already approved. > I'm running some internal testing now... Hi @eme64 has the testing completed? :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3232758702 From jsjolen at openjdk.org Thu Aug 28 11:03:52 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 28 Aug 2025 11:03:52 GMT Subject: RFR: 8366341: [BACKOUT] JDK-8365256: RelocIterator should use indexes instead of pointers Message-ID: <4TBPgc6UCU4GtQsEyTsNk_vrThJwl8Ra-zVWyZWEAuY=.2f040610-2042-4f71-ad4b-683970928c15@github.com> Hi, When a null pointer is accessed in SA it's serialized into the null Java object, this in turn causes runtime NPE:s when attempts are made to perform arithmetic on them. As we changed `_immutable_data` to be null when missing, this hits that corner case in the SA. Example of code which fails: public PCDesc getPCDescAt(Address pc) { // NOTE: scopesPCsBegin() depends on the value of _immutable_data and will throw NPE if immutable_data is null for (Address p = scopesPCsBegin(); p.lessThan(scopesPCsEnd()); p = p.addOffsetTo(pcDescSize)) { PCDesc pcDesc = new PCDesc(p); if (pcDesc.getRealPC(this).equals(pc)) { return pcDesc; } } return null; } There are similar iterators in Hotspot code, they will cause UBSAN to complain instead as we're adding something to a null pointer. The "real fix" requires a lot of work on the SA side, and we cannot prioritize that. Instead, I'm backing out my changes. ------------- Commit messages: - Revert "8365256: RelocIterator should use indexes instead of pointers" Changes: https://git.openjdk.org/jdk/pull/26984/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26984&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8366341 Stats: 88 lines in 4 files changed: 24 ins; 19 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/26984.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26984/head:pull/26984 PR: https://git.openjdk.org/jdk/pull/26984 From ayang at openjdk.org Thu Aug 28 11:11:41 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 28 Aug 2025 11:11:41 GMT Subject: RFR: 8366341: [BACKOUT] JDK-8365256: RelocIterator should use indexes instead of pointers In-Reply-To: <4TBPgc6UCU4GtQsEyTsNk_vrThJwl8Ra-zVWyZWEAuY=.2f040610-2042-4f71-ad4b-683970928c15@github.com> References: <4TBPgc6UCU4GtQsEyTsNk_vrThJwl8Ra-zVWyZWEAuY=.2f040610-2042-4f71-ad4b-683970928c15@github.com> Message-ID: On Thu, 28 Aug 2025 10:59:06 GMT, Johan Sj?len wrote: > Hi, > > When a null pointer is accessed in SA it's serialized into the null Java object, this in turn causes runtime NPE:s when attempts are made to perform arithmetic on them. As we changed `_immutable_data` to be null when missing, this hits that corner case in the SA. > > Example of code which fails: > > > public PCDesc getPCDescAt(Address pc) { > // NOTE: scopesPCsBegin() depends on the value of _immutable_data and will throw NPE if immutable_data is null > for (Address p = scopesPCsBegin(); p.lessThan(scopesPCsEnd()); p = p.addOffsetTo(pcDescSize)) { > PCDesc pcDesc = new PCDesc(p); > if (pcDesc.getRealPC(this).equals(pc)) { > return pcDesc; > } > } > return null; > } > > > There are similar iterators in Hotspot code, they will cause UBSAN to complain instead as we're adding something to a null pointer. > > The "real fix" requires a lot of work on the SA side, and we cannot prioritize that. Instead, I'm backing out my changes. Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26984#pullrequestreview-3164399186 From wenanjian at openjdk.org Thu Aug 28 11:23:30 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 28 Aug 2025 11:23:30 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v5] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: change format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/62f1d99e..6bd22c4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From mbaesken at openjdk.org Thu Aug 28 11:34:43 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 28 Aug 2025 11:34:43 GMT Subject: RFR: 8366222: TestCompileTaskTimeout causes asserts after JDK-8365909 In-Reply-To: References: <4Xg-nEe1-bDKHZKBjcj_W8TZWDLccoUpWYCietIChEk=.a91881c5-4f88-4e1a-a053-cb85937d83bb@github.com> Message-ID: <9z1b-8ugI5Scy7RnRD3jlYWej6LRJvq7Wjgb7qpmrn4=.da86a87a-ddcb-4003-be42-09fec965d979@github.com> On Thu, 28 Aug 2025 07:21:31 GMT, Matthias Baesken wrote: > > Looks good to me too but maybe @MBaesken could verify that the test now passes in their CI. > > I add the PR to our build/test queue . Did some quick testing on linuxx86_64 and the issue is gone with this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26963#issuecomment-3233126133 From roland at openjdk.org Thu Aug 28 11:53:43 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 28 Aug 2025 11:53:43 GMT Subject: RFR: 8358751: C2: Recursive inlining check for compiled lambda forms is broken In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 01:24:52 GMT, Vladimir Ivanov wrote: > Recursive inlining checks are relaxed for compiled LambdaForms. Since LambdaForms are heavily reused, the check is performed on `MethodHandle` receivers instead. > > Unfortunately, the current implementation is broken. JVMState doesn't guarantee presence of receivers for caller frames. > An attempt to fetch pruned receiver reports unrelated info, but, in the worst case, it ends up as an out-of-bounds access into node's input array and crashes the JVM. > > Proposed fix captures receiver information as part of inlining and preserves it on `JVMState` for every compiled LambdaForm frame, so it can be reliably recovered during subsequent inlining attempts. > > Testing: hs-tier1 - hs-tier8 > > (Special thanks to @mroth23 who prepared a reproducer of the bug.) Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26891#pullrequestreview-3164532968 From jsjolen at openjdk.org Thu Aug 28 12:17:50 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 28 Aug 2025 12:17:50 GMT Subject: RFR: 8366341: [BACKOUT] JDK-8365256: RelocIterator should use indexes instead of pointers In-Reply-To: <4TBPgc6UCU4GtQsEyTsNk_vrThJwl8Ra-zVWyZWEAuY=.2f040610-2042-4f71-ad4b-683970928c15@github.com> References: <4TBPgc6UCU4GtQsEyTsNk_vrThJwl8Ra-zVWyZWEAuY=.2f040610-2042-4f71-ad4b-683970928c15@github.com> Message-ID: <2HMbODP7g61gW5P-Om5yrOD_gp3KSds8hNO2YRkgqBM=.dffe3644-a409-4122-a5e8-c3a067b574c8@github.com> On Thu, 28 Aug 2025 10:59:06 GMT, Johan Sj?len wrote: > Hi, > > When a null pointer is accessed in SA it's serialized into the null Java object, this in turn causes runtime NPE:s when attempts are made to perform arithmetic on them. As we changed `_immutable_data` to be null when missing, this hits that corner case in the SA. > > Example of code which fails: > > > public PCDesc getPCDescAt(Address pc) { > // NOTE: scopesPCsBegin() depends on the value of _immutable_data and will throw NPE if immutable_data is null > for (Address p = scopesPCsBegin(); p.lessThan(scopesPCsEnd()); p = p.addOffsetTo(pcDescSize)) { > PCDesc pcDesc = new PCDesc(p); > if (pcDesc.getRealPC(this).equals(pc)) { > return pcDesc; > } > } > return null; > } > > > There are similar iterators in Hotspot code, they will cause UBSAN to complain instead as we're adding something to a null pointer. > > The "real fix" requires a lot of work on the SA side, and we cannot prioritize that. Instead, I'm backing out my changes. Thanks Albert ------------- PR Comment: https://git.openjdk.org/jdk/pull/26984#issuecomment-3233258957 From jsjolen at openjdk.org Thu Aug 28 12:17:50 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 28 Aug 2025 12:17:50 GMT Subject: Integrated: 8366341: [BACKOUT] JDK-8365256: RelocIterator should use indexes instead of pointers In-Reply-To: <4TBPgc6UCU4GtQsEyTsNk_vrThJwl8Ra-zVWyZWEAuY=.2f040610-2042-4f71-ad4b-683970928c15@github.com> References: <4TBPgc6UCU4GtQsEyTsNk_vrThJwl8Ra-zVWyZWEAuY=.2f040610-2042-4f71-ad4b-683970928c15@github.com> Message-ID: On Thu, 28 Aug 2025 10:59:06 GMT, Johan Sj?len wrote: > Hi, > > When a null pointer is accessed in SA it's serialized into the null Java object, this in turn causes runtime NPE:s when attempts are made to perform arithmetic on them. As we changed `_immutable_data` to be null when missing, this hits that corner case in the SA. > > Example of code which fails: > > > public PCDesc getPCDescAt(Address pc) { > // NOTE: scopesPCsBegin() depends on the value of _immutable_data and will throw NPE if immutable_data is null > for (Address p = scopesPCsBegin(); p.lessThan(scopesPCsEnd()); p = p.addOffsetTo(pcDescSize)) { > PCDesc pcDesc = new PCDesc(p); > if (pcDesc.getRealPC(this).equals(pc)) { > return pcDesc; > } > } > return null; > } > > > There are similar iterators in Hotspot code, they will cause UBSAN to complain instead as we're adding something to a null pointer. > > The "real fix" requires a lot of work on the SA side, and we cannot prioritize that. Instead, I'm backing out my changes. This pull request has now been integrated. Changeset: 5c78c7cd Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/5c78c7cd83d2d1ca1ba19151d6be40f5bd6077c8 Stats: 88 lines in 4 files changed: 24 ins; 19 del; 45 mod 8366341: [BACKOUT] JDK-8365256: RelocIterator should use indexes instead of pointers Reviewed-by: ayang ------------- PR: https://git.openjdk.org/jdk/pull/26984 From thartmann at openjdk.org Thu Aug 28 12:19:42 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 28 Aug 2025 12:19:42 GMT Subject: RFR: 8358751: C2: Recursive inlining check for compiled lambda forms is broken In-Reply-To: References: Message-ID: On Fri, 22 Aug 2025 01:24:52 GMT, Vladimir Ivanov wrote: > Recursive inlining checks are relaxed for compiled LambdaForms. Since LambdaForms are heavily reused, the check is performed on `MethodHandle` receivers instead. > > Unfortunately, the current implementation is broken. JVMState doesn't guarantee presence of receivers for caller frames. > An attempt to fetch pruned receiver reports unrelated info, but, in the worst case, it ends up as an out-of-bounds access into node's input array and crashes the JVM. > > Proposed fix captures receiver information as part of inlining and preserves it on `JVMState` for every compiled LambdaForm frame, so it can be reliably recovered during subsequent inlining attempts. > > Testing: hs-tier1 - hs-tier8 > > (Special thanks to @mroth23 who prepared a reproducer of the bug.) What about a regression test? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26891#issuecomment-3233267427 From mhaessig at openjdk.org Thu Aug 28 12:51:51 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 28 Aug 2025 12:51:51 GMT Subject: RFR: 8366222: TestCompileTaskTimeout causes asserts after JDK-8365909 In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 15:02:18 GMT, Manuel H?ssig wrote: > This PR increases the timeout of the positive test case in `compiler/arguments/TestCompileTaskTimeout.java`, because it was too low, such that the test case failed on some systems. The new timeout of 2s should be large enough for all systems. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 Linux fastdebug x64, aarch64 Thank you all for your reviews and testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26963#issuecomment-3233365606 From mhaessig at openjdk.org Thu Aug 28 12:51:52 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 28 Aug 2025 12:51:52 GMT Subject: Integrated: 8366222: TestCompileTaskTimeout causes asserts after JDK-8365909 In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 15:02:18 GMT, Manuel H?ssig wrote: > This PR increases the timeout of the positive test case in `compiler/arguments/TestCompileTaskTimeout.java`, because it was too low, such that the test case failed on some systems. The new timeout of 2s should be large enough for all systems. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 Linux fastdebug x64, aarch64 This pull request has now been integrated. Changeset: 8f864fd5 Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/8f864fd5637762153f26af5121cabdf21e1ad798 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8366222: TestCompileTaskTimeout causes asserts after JDK-8365909 Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/26963 From epeter at openjdk.org Thu Aug 28 13:19:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Aug 2025 13:19:07 GMT Subject: RFR: 8366357: C2 SuperWord: refactor VTransformNode::apply with VTransformApplyState Message-ID: I'm working on **cost-modeling**, and am integrating some smaller changes from this proof-of-concept PR: https://github.com/openjdk/jdk/pull/20964 This is a **pure refactoring** - no change in behaviour. I'm presenting it like this because it will make reviews easier. The goal here is that `VTransformNode::apply` only needs a single argument. This is important, as we will soon add more components that need to be updated during apply. That way, we can simply add more parts to `VTransformApplyState`, and do not need to add more arguments to VTransformNode::apply. And yes: I have considering passing the `apply_state` as `const`. While this may be possible with the current code state, the upcoming changes from https://github.com/openjdk/jdk/pull/20964 will require non-const access to the `apply_state` (e.g. for `set_memory_state`). ------------- Commit messages: - JDK-8366357 Changes: https://git.openjdk.org/jdk/pull/26987/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26987&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8366357 Stats: 123 lines in 3 files changed: 32 ins; 26 del; 65 mod Patch: https://git.openjdk.org/jdk/pull/26987.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26987/head:pull/26987 PR: https://git.openjdk.org/jdk/pull/26987 From epeter at openjdk.org Thu Aug 28 13:26:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Aug 2025 13:26:47 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v8] In-Reply-To: References: Message-ID: <6nb9vphrJMm4PecaNzue-MRxNV7ex_cwY1hNO15t5ZM=.dd359230-cfd1-43a5-b53e-064a5524bd53@github.com> On Tue, 26 Aug 2025 13:07:21 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Modified JTREG testcase to address review comments Tests passed, thanks for the patience! Not a detailed review, but looks reasonable :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26589#pullrequestreview-3164855679 From epeter at openjdk.org Thu Aug 28 13:35:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Aug 2025 13:35:46 GMT Subject: RFR: 8364305: Support AVX10 saturating floating point conversion instructions [v2] In-Reply-To: References: Message-ID: On Thu, 28 Aug 2025 05:11:58 GMT, Mohamed Issa wrote: >> Intel® AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity. >> >> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis ters to store intermediate results. >> >> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11). >> >> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` >> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` >> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` >> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` >> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` >> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` >> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` >> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` >> >> [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-adv... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Add memory variants of the AVX 10.2 floating point conversion instructions @missa-prime Looks like an interesting patch! Do you think you could add some sort of IR test here, to verify that the correct code is generated on AVX10 vs lower AVX? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26919#issuecomment-3233527862 From chagedorn at openjdk.org Thu Aug 28 13:35:46 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 28 Aug 2025 13:35:46 GMT Subject: RFR: 8366357: C2 SuperWord: refactor VTransformNode::apply with VTransformApplyState In-Reply-To: References: Message-ID: On Thu, 28 Aug 2025 12:57:44 GMT, Emanuel Peter wrote: > I'm working on **cost-modeling**, and am integrating some smaller changes from this proof-of-concept PR: https://github.com/openjdk/jdk/pull/20964 > > This is a **pure refactoring** - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > The goal here is that `VTransformNode::apply` only needs a single argument. This is important, as we will soon add more components that need to be updated during apply. That way, we can simply add more parts to `VTransformApplyState`, and do not need to add more arguments to VTransformNode::apply. > > And yes: I have considering passing the `apply_state` as `const`. While this may be possible with the current code state, the upcoming changes from https://github.com/openjdk/jdk/pull/20964 will require non-const access to the `apply_state` (e.g. for `set_memory_state`). Two small suggestions, otherwise, looks good! src/hotspot/share/opto/vtransform.cpp line 737: > 735: // bits in a scalar shift operation. But vector shift does not truncate, so > 736: // we must apply the mask now. > 737: Node* shift_count_masked = new AndINode(shift_count_in, phase->igvn().intcon(_mask)); Randomly noticed this: You should probably use `phase->intcon()` which also sets root as control`. Same at other places further down. You might want to squeeze that into this patch as well. src/hotspot/share/opto/vtransform.hpp line 273: > 271: // generated def (input) nodes when we are generating the use nodes in "apply". > 272: GrowableArray _vtnode_idx_to_transformed_node; > 273: Suggestion: ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26987#pullrequestreview-3164852132 PR Review Comment: https://git.openjdk.org/jdk/pull/26987#discussion_r2307425745 PR Review Comment: https://git.openjdk.org/jdk/pull/26987#discussion_r2307407693 From epeter at openjdk.org Thu Aug 28 13:43:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Aug 2025 13:43:02 GMT Subject: RFR: 8366357: C2 SuperWord: refactor VTransformNode::apply with VTransformApplyState [v2] In-Reply-To: References: Message-ID: > I'm working on **cost-modeling**, and am integrating some smaller changes from this proof-of-concept PR: https://github.com/openjdk/jdk/pull/20964 > > This is a **pure refactoring** - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > The goal here is that `VTransformNode::apply` only needs a single argument. This is important, as we will soon add more components that need to be updated during apply. That way, we can simply add more parts to `VTransformApplyState`, and do not need to add more arguments to VTransformNode::apply. > > And yes: I have considering passing the `apply_state` as `const`. While this may be possible with the current code state, the upcoming changes from https://github.com/openjdk/jdk/pull/20964 will require non-const access to the `apply_state` (e.g. for `set_memory_state`). Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/vtransform.hpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26987/files - new: https://git.openjdk.org/jdk/pull/26987/files/4dbbaa8d..d91d66db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26987&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26987&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26987.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26987/head:pull/26987 PR: https://git.openjdk.org/jdk/pull/26987 From epeter at openjdk.org Thu Aug 28 13:43:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Aug 2025 13:43:02 GMT Subject: RFR: 8366357: C2 SuperWord: refactor VTransformNode::apply with VTransformApplyState [v2] In-Reply-To: References: Message-ID: On Thu, 28 Aug 2025 13:29:20 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/vtransform.hpp >> >> Co-authored-by: Christian Hagedorn > > src/hotspot/share/opto/vtransform.cpp line 737: > >> 735: // bits in a scalar shift operation. But vector shift does not truncate, so >> 736: // we must apply the mask now. >> 737: Node* shift_count_masked = new AndINode(shift_count_in, phase->igvn().intcon(_mask)); > > Randomly noticed this: You should probably use `phase->intcon()` which also sets root as control`. Same at other places further down. You might want to squeeze that into this patch as well. Sure, I can do that. It probably does not matter because we set `major_progress` anyway... but better to do it :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26987#discussion_r2307454827 From epeter at openjdk.org Thu Aug 28 13:55:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Aug 2025 13:55:01 GMT Subject: RFR: 8366357: C2 SuperWord: refactor VTransformNode::apply with VTransformApplyState [v3] In-Reply-To: References: Message-ID: <-URf_iP7rH-Ev5PzEhDseBTqTTCuHiMEYkTdeksxP_0=.14d9721e-b5f9-4d0e-932f-78ca4a6ad12b@github.com> > I'm working on **cost-modeling**, and am integrating some smaller changes from this proof-of-concept PR: https://github.com/openjdk/jdk/pull/20964 > > This is a **pure refactoring** - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > The goal here is that `VTransformNode::apply` only needs a single argument. This is important, as we will soon add more components that need to be updated during apply. That way, we can simply add more parts to `VTransformApplyState`, and do not need to add more arguments to VTransformNode::apply. > > And yes: I have considering passing the `apply_state` as `const`. While this may be possible with the current code state, the upcoming changes from https://github.com/openjdk/jdk/pull/20964 will require non-const access to the `apply_state` (e.g. for `set_memory_state`). > > Also: Christian asked me to squeeze in some other change: `igvn.intcon` -> `phase->intcon`, so that we also set the control to root. It's not been strictly necessary, but probably better to do it. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: For Christian: use phase->intcon instead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26987/files - new: https://git.openjdk.org/jdk/pull/26987/files/d91d66db..db90bc88 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26987&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26987&range=01-02 Stats: 9 lines in 2 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/26987.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26987/head:pull/26987 PR: https://git.openjdk.org/jdk/pull/26987 From chagedorn at openjdk.org Thu Aug 28 13:55:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 28 Aug 2025 13:55:01 GMT Subject: RFR: 8366357: C2 SuperWord: refactor VTransformNode::apply with VTransformApplyState [v3] In-Reply-To: <-URf_iP7rH-Ev5PzEhDseBTqTTCuHiMEYkTdeksxP_0=.14d9721e-b5f9-4d0e-932f-78ca4a6ad12b@github.com> References: <-URf_iP7rH-Ev5PzEhDseBTqTTCuHiMEYkTdeksxP_0=.14d9721e-b5f9-4d0e-932f-78ca4a6ad12b@github.com> Message-ID: On Thu, 28 Aug 2025 13:51:40 GMT, Emanuel Peter wrote: >> I'm working on **cost-modeling**, and am integrating some smaller changes from this proof-of-concept PR: https://github.com/openjdk/jdk/pull/20964 >> >> This is a **pure refactoring** - no change in behaviour. I'm presenting it like this because it will make reviews easier. >> >> The goal here is that `VTransformNode::apply` only needs a single argument. This is important, as we will soon add more components that need to be updated during apply. That way, we can simply add more parts to `VTransformApplyState`, and do not need to add more arguments to VTransformNode::apply. >> >> And yes: I have considering passing the `apply_state` as `const`. While this may be possible with the current code state, the upcoming changes from https://github.com/openjdk/jdk/pull/20964 will require non-const access to the `apply_state` (e.g. for `set_memory_state`). >> >> Also: Christian asked me to squeeze in some other change: `igvn.intcon` -> `phase->intcon`, so that we also set the control to root. It's not been strictly necessary, but probably better to do it. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > For Christian: use phase->intcon instead Thanks for the update, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26987#pullrequestreview-3164958854 From epeter at openjdk.org Thu Aug 28 13:55:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Aug 2025 13:55:02 GMT Subject: RFR: 8366357: C2 SuperWord: refactor VTransformNode::apply with VTransformApplyState [v3] In-Reply-To: References: Message-ID: On Thu, 28 Aug 2025 13:33:10 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> For Christian: use phase->intcon instead > > Two small suggestions, otherwise, looks good! @chhagedorn Thanks for the quick review! I applied all changes you suggested ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26987#issuecomment-3233586328 From epeter at openjdk.org Thu Aug 28 13:55:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Aug 2025 13:55:03 GMT Subject: RFR: 8366357: C2 SuperWord: refactor VTransformNode::apply with VTransformApplyState [v3] In-Reply-To: References: Message-ID: On Thu, 28 Aug 2025 13:39:44 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vtransform.cpp line 737: >> >>> 735: // bits in a scalar shift operation. But vector shift does not truncate, so >>> 736: // we must apply the mask now. >>> 737: Node* shift_count_masked = new AndINode(shift_count_in, phase->igvn().intcon(_mask)); >> >> Randomly noticed this: You should probably use `phase->intcon()` which also sets root as control`. Same at other places further down. You might want to squeeze that into this patch as well. > > Sure, I can do that. It probably does not matter because we set `major_progress` anyway... but better to do it :) I'm doing it in a few SuperWord related places now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26987#discussion_r2307468339 From kvn at openjdk.org Thu Aug 28 14:19:43 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 28 Aug 2025 14:19:43 GMT Subject: RFR: 8366357: C2 SuperWord: refactor VTransformNode::apply with VTransformApplyState [v3] In-Reply-To: <-URf_iP7rH-Ev5PzEhDseBTqTTCuHiMEYkTdeksxP_0=.14d9721e-b5f9-4d0e-932f-78ca4a6ad12b@github.com> References: <-URf_iP7rH-Ev5PzEhDseBTqTTCuHiMEYkTdeksxP_0=.14d9721e-b5f9-4d0e-932f-78ca4a6ad12b@github.com> Message-ID: On Thu, 28 Aug 2025 13:55:01 GMT, Emanuel Peter wrote: >> I'm working on **cost-modeling**, and am integrating some smaller changes from this proof-of-concept PR: https://github.com/openjdk/jdk/pull/20964 >> >> This is a **pure refactoring** - no change in behaviour. I'm presenting it like this because it will make reviews easier. >> >> The goal here is that `VTransformNode::apply` only needs a single argument. This is important, as we will soon add more components that need to be updated during apply. That way, we can simply add more parts to `VTransformApplyState`, and do not need to add more arguments to VTransformNode::apply. >> >> And yes: I have considering passing the `apply_state` as `const`. While this may be possible with the current code state, the upcoming changes from https://github.com/openjdk/jdk/pull/20964 will require non-const access to the `apply_state` (e.g. for `set_memory_state`). >> >> Also: Christian asked me to squeeze in some other change: `igvn.intcon` -> `phase->intcon`, so that we also set the control to root. It's not been strictly necessary, but probably better to do it. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > For Christian: use phase->intcon instead Looks good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26987#pullrequestreview-3165064139 From epeter at openjdk.org Thu Aug 28 14:50:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Aug 2025 14:50:46 GMT Subject: RFR: 8366357: C2 SuperWord: refactor VTransformNode::apply with VTransformApplyState [v3] In-Reply-To: References: <-URf_iP7rH-Ev5PzEhDseBTqTTCuHiMEYkTdeksxP_0=.14d9721e-b5f9-4d0e-932f-78ca4a6ad12b@github.com> Message-ID: On Thu, 28 Aug 2025 14:42:46 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> For Christian: use phase->intcon instead > > src/hotspot/share/opto/superword.cpp line 3072: > >> 3070: >> 3071: // 1.1: con >> 3072: Node* xbic = phase()->intcon(is_sub ? -con : con); > > What changed that you need/want `PhaseIdealLoop*` here instead of `PhaseIterGVN*`? @chhagedorn Suggested it here: https://github.com/openjdk/jdk/pull/26987#discussion_r2307425745 That way, we set the control of the new constant. It was not strictly necessary, but good practice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26987#discussion_r2307649223 From mhaessig at openjdk.org Thu Aug 28 14:50:45 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 28 Aug 2025 14:50:45 GMT Subject: RFR: 8366357: C2 SuperWord: refactor VTransformNode::apply with VTransformApplyState [v3] In-Reply-To: <-URf_iP7rH-Ev5PzEhDseBTqTTCuHiMEYkTdeksxP_0=.14d9721e-b5f9-4d0e-932f-78ca4a6ad12b@github.com> References: <-URf_iP7rH-Ev5PzEhDseBTqTTCuHiMEYkTdeksxP_0=.14d9721e-b5f9-4d0e-932f-78ca4a6ad12b@github.com> Message-ID: On Thu, 28 Aug 2025 13:55:01 GMT, Emanuel Peter wrote: >> I'm working on **cost-modeling**, and am integrating some smaller changes from this proof-of-concept PR: https://github.com/openjdk/jdk/pull/20964 >> >> This is a **pure refactoring** - no change in behaviour. I'm presenting it like this because it will make reviews easier. >> >> The goal here is that `VTransformNode::apply` only needs a single argument. This is important, as we will soon add more components that need to be updated during apply. That way, we can simply add more parts to `VTransformApplyState`, and do not need to add more arguments to VTransformNode::apply. >> >> And yes: I have considering passing the `apply_state` as `const`. While this may be possible with the current code state, the upcoming changes from https://github.com/openjdk/jdk/pull/20964 will require non-const access to the `apply_state` (e.g. for `set_memory_state`). >> >> Also: Christian asked me to squeeze in some other change: `igvn.intcon` -> `phase->intcon`, so that we also set the control to root. It's not been strictly necessary, but probably better to do it. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > For Christian: use phase->intcon instead Thank you for working on this, @eme64. The changes look good to me. There is only one detail that I do not understand. Marked as reviewed by mhaessig (Committer). src/hotspot/share/opto/superword.cpp line 3072: > 3070: > 3071: // 1.1: con > 3072: Node* xbic = phase()->intcon(is_sub ? -con : con); What changed that you need/want `PhaseIdealLoop*` here instead of `PhaseIterGVN*`? ------------- PR Review: https://git.openjdk.org/jdk/pull/26987#pullrequestreview-3165185672 PR Review: https://git.openjdk.org/jdk/pull/26987#pullrequestreview-3165210942 PR Review Comment: https://git.openjdk.org/jdk/pull/26987#discussion_r2307638278 From mhaessig at openjdk.org Thu Aug 28 14:50:47 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 28 Aug 2025 14:50:47 GMT Subject: RFR: 8366357: C2 SuperWord: refactor VTransformNode::apply with VTransformApplyState [v3] In-Reply-To: References: <-URf_iP7rH-Ev5PzEhDseBTqTTCuHiMEYkTdeksxP_0=.14d9721e-b5f9-4d0e-932f-78ca4a6ad12b@github.com> Message-ID: On Thu, 28 Aug 2025 14:45:57 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 3072: >> >>> 3070: >>> 3071: // 1.1: con >>> 3072: Node* xbic = phase()->intcon(is_sub ? -con : con); >> >> What changed that you need/want `PhaseIdealLoop*` here instead of `PhaseIterGVN*`? > > @chhagedorn Suggested it here: https://github.com/openjdk/jdk/pull/26987#discussion_r2307425745 > > That way, we set the control of the new constant. It was not strictly necessary, but good practice. Makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26987#discussion_r2307654854 From kvn at openjdk.org Thu Aug 28 14:57:00 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 28 Aug 2025 14:57:00 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v44] In-Reply-To: <-1ZEATIUSOf-ArW2v7P5a7YbshB53kb5mVPw9ihkLXA=.8b526e80-0a6d-4d0b-ad31-443c0e0c066a@github.com> References: <-1ZEATIUSOf-ArW2v7P5a7YbshB53kb5mVPw9ihkLXA=.8b526e80-0a6d-4d0b-ad31-443c0e0c066a@github.com> Message-ID: On Wed, 27 Aug 2025 18:04:54 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 109 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Lock nmethod::relocate behind experimental flag > - Use CompiledICLocker instead of CompiledIC_lock > - Fix spacing > - Update NMethod.java with immutable data changes > - Rename method to nm > - Add assert before freeing immutable data > - Reorder is_relocatable checks > - ... and 99 more: https://git.openjdk.org/jdk/compare/bd4c0f4a...668eb4ae JDK-8365256 was backed put. You need to merge again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3233832615 From epeter at openjdk.org Thu Aug 28 15:00:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Aug 2025 15:00:45 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Sat, 7 Dec 2024 09:16:29 GMT, Fei Gao wrote: > In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the > `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. > > Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. > > To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. > > The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. > > This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. > > The whole process is done by the function `insert_post_loop()`. > > We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: > > 1. The fall-in control flow to the vectorized drain loop comes from a `RegionNode` merging exits ... I'm a little sick and don't feel very focused, so I'll have to look at the PR next week. BTW: I just integrated https://github.com/openjdk/jdk/pull/24278 which may have silent merge conflicts, so it would be good if you merged and tested again. Once you do that I could also run some internal testing, if you like :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3233849428 From shade at openjdk.org Thu Aug 28 15:03:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 28 Aug 2025 15:03:55 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v8] In-Reply-To: References: Message-ID: On Tue, 26 Aug 2025 13:07:21 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Modified JTREG testcase to address review comments Let's go, we need this patch in JDK 25, which requires some soak time in mainline :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3233863334 From bkilambi at openjdk.org Thu Aug 28 15:16:46 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 28 Aug 2025 15:16:46 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v8] In-Reply-To: References: Message-ID: On Thu, 28 Aug 2025 15:01:29 GMT, Aleksey Shipilev wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Modified JTREG testcase to address review comments > > Let's go, we need this patch in JDK 25, which requires some soak time in mainline :) @shipilev @theRealAph could I please ask for another approval from you for the latest patch please? Then I'll integrate. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3233916308 From shade at openjdk.org Thu Aug 28 15:19:44 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 28 Aug 2025 15:19:44 GMT Subject: RFR: 8365726: Test crashed with assert in C1 thread: Possible safepoint reached by thread that does not allow it In-Reply-To: References: Message-ID: <7MCFerQ2-mr_mTddubo_-H4D7Q6-G4bfTjF6P5edPac=.4b621900-b196-4b15-a62a-ee93ae1d6b57@github.com> On Wed, 27 Aug 2025 17:04:27 GMT, Igor Veresov wrote: > `TrainingData_lock` guards a non-thread safe container and is only locked for a short time. Allow it to skip the safepoint check. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26964#pullrequestreview-3165362422 From epeter at openjdk.org Thu Aug 28 15:39:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Aug 2025 15:39:53 GMT Subject: RFR: 8366361: C2 SuperWord: rename VTransformNode::set_req -> init_req, analogue to Node::init_req Message-ID: I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: https://github.com/openjdk/jdk/pull/20964 This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. The current implementation of `VTransformNode::set_req` has `init_req` semantics, it verifies that the corresponding input is still nullptr. We should thus rename it. It will also free up the "set_req" name for later use in VTransform optimizations, where we want to modify the graph. See `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop` in the proof-of-concept PR. FYI: this PR is dependent on https://github.com/openjdk/jdk/pull/26987. I'll rebase once that one is integrated. We can still already review, so that the process is a little faster later on. (I have more small changes coming, but separating makes them more reviewable.) ------------- Depends on: https://git.openjdk.org/jdk/pull/26987 Commit messages: - JDK-8366361 Changes: https://git.openjdk.org/jdk/pull/26991/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26991&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8366361 Stats: 26 lines in 3 files changed: 0 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/26991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26991/head:pull/26991 PR: https://git.openjdk.org/jdk/pull/26991 From shade at openjdk.org Thu Aug 28 15:43:46 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 28 Aug 2025 15:43:46 GMT Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v8] In-Reply-To: References: Message-ID: On Tue, 26 Aug 2025 13:07:21 GMT, Bhavana Kilambi wrote: >> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test - >> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as - >> >> >> public void vectorAddConstInputFloat16() { >> for (int i = 0; i < LEN; ++i) { >> output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); >> } >> } >> >> >> >> >> >> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. >> >> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node). >> >> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Modified JTREG testcase to address review comments I ran a few tests on my Graviton 3 host, where I have seen failures, and they are gone. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26589#pullrequestreview-3165467247 From iveresov at openjdk.org Thu Aug 28 15:47:48 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 28 Aug 2025 15:47:48 GMT Subject: RFR: 8365726: Test crashed with assert in C1 thread: Possible safepoint reached by thread that does not allow it In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 17:04:27 GMT, Igor Veresov wrote: > `TrainingData_lock` guards a non-thread safe container and is only locked for a short time. Allow it to skip the safepoint check. Thanks guys! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26964#issuecomment-3234032425 From iveresov at openjdk.org Thu Aug 28 15:47:48 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 28 Aug 2025 15:47:48 GMT Subject: Integrated: 8365726: Test crashed with assert in C1 thread: Possible safepoint reached by thread that does not allow it In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 17:04:27 GMT, Igor Veresov wrote: > `TrainingData_lock` guards a non-thread safe container and is only locked for a short time. Allow it to skip the safepoint check. This pull request has now been integrated. Changeset: 452b052f Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/452b052fe343a70bc81bf299d08a9f06a1e30fe9 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod 8365726: Test crashed with assert in C1 thread: Possible safepoint reached by thread that does not allow it Reviewed-by: dlong, shade ------------- PR: https://git.openjdk.org/jdk/pull/26964 From kvn at openjdk.org Thu Aug 28 15:49:42 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 28 Aug 2025 15:49:42 GMT Subject: RFR: 8366361: C2 SuperWord: rename VTransformNode::set_req -> init_req, analogue to Node::init_req In-Reply-To: References: Message-ID: <4e4Be6YO2guGqFmkzTQyTJzPD3D3qAXxCJ14F8F61Gg=.cf6a0cf2-5654-452d-96c4-b034e52bd1e6@github.com> On Thu, 28 Aug 2025 15:30:31 GMT, Emanuel Peter wrote: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > The current implementation of `VTransformNode::set_req` has `init_req` semantics, it verifies that the corresponding input is still nullptr. We should thus rename it. It will also free up the "set_req" name for later use in VTransform optimizations, where we want to modify the graph. > > See `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop` in the proof-of-concept PR. > > FYI: this PR is dependent on https://github.com/openjdk/jdk/pull/26987. I'll rebase once that one is integrated. We can still already review, so that the process is a little faster later on. (I have more small changes coming, but separating makes them more reviewable.) Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26991#pullrequestreview-3165486608 From dlong at openjdk.org Thu Aug 28 20:21:59 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 28 Aug 2025 20:21:59 GMT Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64 [v5] In-Reply-To: <-MqvO74Up2R0qmEDtgyGY-yScxZ-v6ZQWxDtSxpKO_g=.56d4eeca-670d-41e4-9e96-ba20b1b44100@github.com> References: <-MqvO74Up2R0qmEDtgyGY-yScxZ-v6ZQWxDtSxpKO_g=.56d4eeca-670d-41e4-9e96-ba20b1b44100@github.com> Message-ID: On Wed, 27 Aug 2025 20:17:07 GMT, Erik ?sterlund wrote: >> @fisk , can I get you to review this? > >> @fisk , can I get you to review this? > > Sure! Based on the symptoms you described, my main comment is that we might be looking at the wrong places. I don't know if this is really about lock contention. Perhaps it is indirectly. But you mention there is still so e regression with ZGC. > > My hypothesis would be that it is the unnecessary incrementing of the global patching epoch that causes the regression when using ZGC. It is only really needed when disarming the nmethod - in orher words when the guard value is set to the good value. > > The point of incrementing the patching epoch is to protect other threads from entering the nmethod without executing an instruction cross modication fence. And all other threads will have to do that. > > Only ZGC uses the mode of nmethod entry barriers that does this due to being the only GC that updates instructions in a concurrent phase on AArch64. We are conservative on AArch64 and ensure the use of appropriate synchronous cross modifying code. But that's not needed when arming, which is what we do when making the bmethod not entrant. Thanks @fisk, that's a good theory, but it is not what I am seeing. For G1, lock contention does seem to explain the issue, and this PR fixes the regression. (Also the lock overhead I measured seemed to agree with the regression in GC phase time and benchmark scores.) For ZGC, I am not seeing an increase in calls to BarrierSetAssembler::increment_patching_epoch(). And increment_patching_epoch() is only called when disarming -- that part hasn't changed. I think the suspected ZGC regression is just noise in the benchmark, as the benchmark can show a regression across multiple runs even with the same build, flags, and host. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26399#issuecomment-3234822398 From duke at openjdk.org Thu Aug 28 20:22:19 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 28 Aug 2025 20:22:19 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v45] In-Reply-To: References: Message-ID: <8rwL34-1XkZ0yuptsaCups6zmpJYn3Hr4JuWAhJhzYs=.e16aeb56-fea0-4a1e-add1-742ce2449542@github.com> > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality > > Additional Testing: > - [x] Linux x64 fastdebug tier 1/2/3/4 > - [x] Linux aarch64 fastdebug tier 1/2/3/4 Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 111 commits: - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Refactor JVMTI test - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Lock nmethod::relocate behind experimental flag - Use CompiledICLocker instead of CompiledIC_lock - Fix spacing - Update NMethod.java with immutable data changes - Rename method to nm - ... and 101 more: https://git.openjdk.org/jdk/compare/9f70965b...03a69587 ------------- Changes: https://git.openjdk.org/jdk/pull/23573/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=44 Stats: 1600 lines in 26 files changed: 1535 ins; 2 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Thu Aug 28 20:22:19 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 28 Aug 2025 20:22:19 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v44] In-Reply-To: References: <-1ZEATIUSOf-ArW2v7P5a7YbshB53kb5mVPw9ihkLXA=.8b526e80-0a6d-4d0b-ad31-443c0e0c066a@github.com> Message-ID: On Wed, 27 Aug 2025 23:24:31 GMT, Vladimir Kozlov wrote: > just noticed (by looking on nmethodrelocation.java last changes) that you placed new testing into `test/hotspot/jtreg/vmTestbase/nsk/jvmti/`. Which is old tests directory. > > Any reason you placed it there instead of `test/hotspot/jtreg/serviceability/jvmti` ? I didn't realize that was the old directory. I refactored the test and moved it into the correct directory ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3234823068 From sparasa at openjdk.org Thu Aug 28 21:50:28 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 28 Aug 2025 21:50:28 GMT Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for commutative operations with same dst and src2 Message-ID: This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same. Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor). For example: `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding ------------- Commit messages: - remove trailing whitespaces - remove unused instructions - 8354348: Enable Extended EVEX to REX2/REX demotion for commutative operations with same dst and src2 Changes: https://git.openjdk.org/jdk/pull/26997/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26997&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354348 Stats: 3194 lines in 5 files changed: 630 ins; 159 del; 2405 mod Patch: https://git.openjdk.org/jdk/pull/26997.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26997/head:pull/26997 PR: https://git.openjdk.org/jdk/pull/26997 From jiangli at openjdk.org Fri Aug 29 01:04:54 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 29 Aug 2025 01:04:54 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation [v3] In-Reply-To: References: Message-ID: <7vfGErc_VX2Dyz5F143poK4HDHDQSz2tzMJ8IGXZTJs=.72dbfa24-2bd7-4820-bb44-c18e89ac9f46@github.com> On Wed, 27 Aug 2025 09:01:28 GMT, Man Cao wrote: >> Hi, >> >> Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause. >> >> I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation. >> >> -Man > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use List.of in test Marked as reviewed by jiangli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26932#pullrequestreview-3167011980 From jiangli at openjdk.org Fri Aug 29 01:04:55 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 29 Aug 2025 01:04:55 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation [v3] In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 19:03:59 GMT, Man Cao wrote: >>> AFAICT, the block of code here is intended for handling the case when intermediate is not disabled. Your change subtly alters that. >>> When TieredCompilation is disabled, the large method compilation is done via CompileBroker::compile_method if !CompileBroker::compilation_is_in_queue(mh) is true. I confirmed that in lldb, see below. Is there any reason to not do can_be_compiled check when calling CompileBroker::compile_method? >> >> Trying to compile the large method under `-XX:-TieredCompilation` is the bug. The large method should not be compiled under `-XX:+DontCompileHugeMethods`. >> >> The bug is caused by erroneously guarding the `!can_be_compiled()` and `!can_be_osr_compiled()` checks behind `!CompilationModeFlag::disable_intermediate()`. The correct behavior is to do the following checks and returns regardless of `TieredCompilation`: >> >> if ((bci == InvocationEntryBci && !can_be_compiled(mh, level))) { >> return; >> } >> if ((bci != InvocationEntryBci && !can_be_osr_compiled(mh, level))) { >> return; >> } >> >> Only the recursive call to `compile(mh, bci, CompLevel_simple, THREAD)` and `osr_nm->make_not_entrant()` need to be guarded under `!disable_intermediate()`. >> >> It is possible to add the above two checks for `bci`, `can_be_compiled()` and `!can_be_osr_compiled()` to inside `CompileBroker::compile_method()`, specifically inside `CompileBroker::compilation_is_prohibited()`. If compiler-dev team prefers this way, we could move them. > > To answer more directly: >> Additionally, should can_be_compiled check only be done for c2 compilation or if it should also be applied to c1 compilation? > > `can_be_compiled()` and `can_be_osr_compiled()` should be applied to both C1 and C2 compilation. > >> Is there any reason to not do `can_be_compiled` check when calling `CompileBroker::compile_method`? > > My rationale is to keep the code similar prior to [JDK-8251462](https://bugs.openjdk.org/browse/JDK-8251462). As mentioned above, it is possible to add those checks to `CompileBroker::compile_method()` or `CompileBroker::compilation_is_prohibited()`. I could do that if there's a strong preference. A potential issue with that approach is that the code here in `CompilationPolicy::compile()` is still confusing: why do those two `if (...) { return;}` checks only apply to `!CompilationModeFlag::disable_intermediate()`. Thanks for addressing my questions. The change seems fine. One of my other comment/question that I raised in internal code review was history context on why the size limit was set. I could not find historical info on that by searching the bugs in https://bugs.openjdk.org. That's however indirectly related to your fix. The jtreg test looks good. Thanks for adding. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26932#discussion_r2308876145 From rehn at openjdk.org Fri Aug 29 06:29:56 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 29 Aug 2025 06:29:56 GMT Subject: RFR: 8365926: RISC-V: Performance regression in renaissance (chi-square) Message-ID: Hey, please consider! A bunch of info in JBS entry, please read that also. I narrowed this issue down to the old jal optimization, making direct calls when in reach. This patch restores them and removes this regression. In essence we turn "jalr ra,0(t1)" into a "jal ra," if reachable, and restore the jalr if a new destination is not reachable. Please test on your hardware! Chi Square (100 runs each, 10 fastest iterations of each run, P550) JDK-23 (last version with trampoline calls) Mean: 3189.5827 Standard Deviation: 284.6478 JDK-25 Mean: 3424.8905 Standard Deviation: 222.2208 Patch: Mean: 3144.8535 Standard Deviation: 229.2577 No issues found in t1, running t2 also. Stress tested on vf2, bpi-f2, p550. ------------- Commit messages: - Merge branch 'master' into 8365926 - draft jal<->jalr Changes: https://git.openjdk.org/jdk/pull/26944/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26944&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365926 Stats: 85 lines in 3 files changed: 68 ins; 2 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/26944.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26944/head:pull/26944 PR: https://git.openjdk.org/jdk/pull/26944 From rcastanedalo at openjdk.org Fri Aug 29 06:51:24 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 29 Aug 2025 06:51:24 GMT Subject: RFR: 8365791: IGV: Update build dependencies Message-ID: This changeset updates IGV's Apache Batik dependency, which is used for exporting graphs into SVG files (`File -> Export current graph...`), to its latest version. **Testing:** checked manually that a few graphs are correctly exported as SVG files. ------------- Commit messages: - Update batik version to 1.19 Changes: https://git.openjdk.org/jdk/pull/27000/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27000&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365791 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27000/head:pull/27000 PR: https://git.openjdk.org/jdk/pull/27000 From chagedorn at openjdk.org Fri Aug 29 06:58:41 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 29 Aug 2025 06:58:41 GMT Subject: RFR: 8366361: C2 SuperWord: rename VTransformNode::set_req -> init_req, analogue to Node::init_req In-Reply-To: References: Message-ID: On Thu, 28 Aug 2025 15:30:31 GMT, Emanuel Peter wrote: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > The current implementation of `VTransformNode::set_req` has `init_req` semantics, it verifies that the corresponding input is still nullptr. We should thus rename it. It will also free up the "set_req" name for later use in VTransform optimizations, where we want to modify the graph. > > See `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop` in the proof-of-concept PR. > > FYI: this PR is dependent on https://github.com/openjdk/jdk/pull/26987. I'll rebase once that one is integrated. We can still already review, so that the process is a little faster later on. (I have more small changes coming, but separating makes them more reviewable.) Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26991#pullrequestreview-3167592057 From chagedorn at openjdk.org Fri Aug 29 06:59:41 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 29 Aug 2025 06:59:41 GMT Subject: RFR: 8365791: IGV: Update build dependencies In-Reply-To: References: Message-ID: On Fri, 29 Aug 2025 06:37:30 GMT, Roberto Casta?eda Lozano wrote: > This changeset updates IGV's Apache Batik dependency, which is used for exporting graphs into SVG files (`File -> Export current graph...`), to its latest version. > > **Testing:** checked manually that a few graphs are correctly exported as SVG files. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27000#pullrequestreview-3167594959 From dlunden at openjdk.org Fri Aug 29 09:38:58 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 29 Aug 2025 09:38:58 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v24] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits: - Restore modified java/lang/invoke tests - Sort includes (new requirement) - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates - Add clarifying comments at definitions of register mask sizes - Fix implicit zero and nullptr checks - Add deep copy comment - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates - Fix typo - Updates after Emanuel's comments - Refactor and improve TestNestedSynchronize.java - ... and 25 more: https://git.openjdk.org/jdk/compare/b39c7369...80c6cf47 ------------- Changes: https://git.openjdk.org/jdk/pull/20404/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=23 Stats: 2647 lines in 28 files changed: 2095 ins; 276 del; 276 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From ayang at openjdk.org Fri Aug 29 10:26:42 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 29 Aug 2025 10:26:42 GMT Subject: RFR: 8365791: IGV: Update build dependencies In-Reply-To: References: Message-ID: On Fri, 29 Aug 2025 06:37:30 GMT, Roberto Casta?eda Lozano wrote: > This changeset updates IGV's Apache Batik dependency, which is used for exporting graphs into SVG files (`File -> Export current graph...`), to its latest version. > > **Testing:** checked manually that a few graphs are correctly exported as SVG files. Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27000#pullrequestreview-3168222816 From thartmann at openjdk.org Fri Aug 29 10:48:46 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 29 Aug 2025 10:48:46 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation [v3] In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 09:01:28 GMT, Man Cao wrote: >> Hi, >> >> Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause. >> >> I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation. >> >> -Man > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use List.of in test I think the test needs `-Xbatch` The new test fails in our testing: ----------System.err:(20/1081)---------- stdout: [590 1 n jdk.internal.misc.Unsafe::getReferenceVolatile (native) 595 2 n jdk.internal.vm.Continuation::enterSpecial (native) (static) 595 3 n jdk.internal.vm.Continuation::doYield (native) (static) 1999665 ]; stderr: [] exitValue = 0 java.lang.RuntimeException: ' HugeSwitch::shortMethod (' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:253) at compiler.runtime.TestDontCompileHugeMethods.runTest(TestDontCompileHugeMethods.java:110) at compiler.runtime.TestDontCompileHugeMethods.main(TestDontCompileHugeMethods.java:119) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) at java.base/java.lang.Thread.run(Thread.java:1474) ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26932#pullrequestreview-3168288213 PR Comment: https://git.openjdk.org/jdk/pull/26932#issuecomment-3236599206 From chagedorn at openjdk.org Fri Aug 29 13:20:45 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 29 Aug 2025 13:20:45 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v3] In-Reply-To: References: Message-ID: <_EN6o6Jwu73CNwvSXYt2cHSHu6Yglkp86f1t7lywwi4=.a84b6fac-327a-48a5-8f1e-772b31d8da10@github.com> On Thu, 14 Aug 2025 18:07:53 GMT, Francisco Ferrari Bihurriet wrote: >> Francisco Ferrari Bihurriet has updated the pull request incrementally with three additional commits since the last revision: >> >> - Improve the IR test to add the new covered cases >> >> I also checked the test is now failing in the master branch (at >> f95af744b07a9ec87e2507b3d584cbcddc827bbd). >> - Remove IR test inverted asserts >> >> According to my IGV observations, these inversions aren't necessarily >> effective. Also, I assume it is safe to remove them because if I apply >> this change to the master branch, the test still passes (tested at >> f95af744b07a9ec87e2507b3d584cbcddc827bbd). >> - Add requested comments from the reviews >> >> Add a comment with the BoolTest::cc2logical inferences tables, as >> suggested by @tabjy. >> >> Also, add a comment explaining how PhaseCCP::push_cmpu is handling >> grandparent updates in the case 1b, as agreed with @chhagedorn. > > # Absence note > > Today is the last day before a ~2 weeks vacation, so my next working day is Monday, September 1st. > > Please feel free to keep giving feedback and/or reviews, and I will continue when I'm back. > > Cheers, > Francisco Hi @franferrax, hope you had a good vacation! > Hi @chhagedorn, > > I added the new tests in [e6b1cb8](https://github.com/openjdk/jdk/commit/e6b1cb897d9c75b34744c7d24f72abcec9986b0b). One problem I'm facing is that I'm unable to generate `Bool` nodes with arbitrary `BoolTest` values. Even if I try the assert inversions I removed in [10e1e3f](https://github.com/openjdk/jdk/commit/10e1e3f4f796d05dcd5c56bc2365d5d564d93952), C2 has preference for `BoolTest::ne`, `BoolTest::le` and `BoolTest::lt`. Instead of using `BoolTest::eq`, `BoolTest::gt` or `BoolTest::ge`, it swaps what is put in `IfTrue` and `IfFalse`. > > Even if `javac` generates an `ifeq` and an `ifne` with the same inputs, instead of a single `CmpU` with two `Bool`s (`BoolTest::eq` and `BoolTest::ne`), I get a single `Bool` (`BoolTest::ne`) with two `If` (one of them swapping `IfTrue` with `IfFalse`). I guess this is some sort of canonicalization to enable further optimizations. > > Do you know a way to influence the `Bool`'s `BoolTest` value? Or @rwestrel do you? > > This means the following 8 cases are not really testing what they claim, but repeating other cases with `IfTrue` and `IfFalse` swapped: > > * `testCase1aOptimizeAsFalseForGT(xm|mx)` (they should use `BoolTest::gt`, but use `BoolTest::le`) > * `testCase1bOptimizeAsFalseForEQ(xm|mx)` (they should use `BoolTest::eq`, but use `BoolTest::ne`) > * `testCase1bOptimizeAsFalseForGE(xm|mx)` (they should use `BoolTest::ge`, but use `BoolTest::lt`) > * `testCase1bOptimizeAsFalseForGT(xm|mx)` (they should use `BoolTest::gt`, but use `BoolTest::le`) > > Even if we don't find a way to influence the `BoolTest`, the cases are still valid and can be kept (just in case the described behaviour changes). Hm, that's a good point. `Parse::do_if()` indeed always canonicalizes the `Bool` nodes... But I was sure we can still somehow end up with non-canonicalized versions again with some tricks. I was curious and played around with some examples and could indeed find test cases for `gt`, `ge` , and `eq`. I was then also thinking about notification code in IGVN. We already concluded further up that it's not needed for CCP because `CmpU` nodes below `AddI` nodes are put to the worklist again. However, with IGVN, we could modify the graph above the `AndI` as well. We miss notification code for `CmpU` below `AndI`. I changed my test cases further to also run into such a missing optimization case. When run with `-XX:VerifyIterativeGVN=1110`, we indeed get such an assertion failure with the proposed patch (it also triggers an assertion failure already with mainline code). This could be easily fixed with: diff --git a/src/hotspot/share/opto/phaseX.cpp b/src/hotspot/share/opto/phaseX.cpp --- a/src/hotspot/share/opto/phaseX.cpp (revision afa8e79ba1a76066cf969cb3b5f76ea804780872) +++ b/src/hotspot/share/opto/phaseX.cpp (date 1756472877934) @@ -2553,7 +2553,7 @@ if (use_op == Op_AndI || use_op == Op_AndL) { for (DUIterator_Fast i2max, i2 = use->fast_outs(i2max); i2 < i2max; i2++) { Node* u = use->fast_out(i2); - if (u->Opcode() == Op_RShiftI || u->Opcode() == Op_RShiftL) { + if (u->Opcode() == Op_RShiftI || u->Opcode() == Op_RShiftL || u->Opcode() == Op_CmpU) { worklist.push(u); } } Here are the test cases with some further comments explaining how it works and how to run it: [Test.java](https://github.com/user-attachments/files/22046942/Test.java) This will produce the following IR (at `PhaseIdealLoop1`): image I guess you could easily transform these into IR tests and check that we have 4 `CmoveI/CmpU` nodes in `PhaseIdealLoop1` and then no more in `PhaseIdealLoop2`. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26666#issuecomment-3237014744 From epeter at openjdk.org Fri Aug 29 14:24:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 29 Aug 2025 14:24:09 GMT Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes Message-ID: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: https://github.com/openjdk/jdk/pull/20964 This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;) We split the `VTransformScalarNode`: - `VTransformMemopScalarNode` - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`. - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`. - `VTransformLoopPhiNode` - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges. - `VTransformCFGNode` - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG. - `VTransformDataScalarNode` - These represent all the normal "calculation" nodes in the loop. - `VTransformInputScalarNode` -> `VTransformOuterNode`: - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later. I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`). ------------- Depends on: https://git.openjdk.org/jdk/pull/26991 Commit messages: - improve print_spec - rm comment - InputScalar -> Outer renaming - rm useless methods - rm vloop_analyzer from vpointer method - JDK-8366427 Changes: https://git.openjdk.org/jdk/pull/27002/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27002&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8366427 Stats: 157 lines in 4 files changed: 114 ins; 0 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/27002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27002/head:pull/27002 PR: https://git.openjdk.org/jdk/pull/27002 From epeter at openjdk.org Fri Aug 29 14:24:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 29 Aug 2025 14:24:09 GMT Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes In-Reply-To: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> Message-ID: On Fri, 29 Aug 2025 09:49:46 GMT, Emanuel Peter wrote: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;) > > We split the `VTransformScalarNode`: > - `VTransformMemopScalarNode` > - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`. > - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`. > - `VTransformLoopPhiNode` > - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges. > - `VTransformCFGNode` > - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG. > - `VTransformDataScalarNode` > - These represent all the normal "calculation" nodes in the loop. > - `VTransformInputScalarNode` -> `VTransformOuterNode`: > - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later. > > I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`). src/hotspot/share/opto/vtransform.cpp line 734: > 732: // This was just wrapped. Now we simply unwap without touching the inputs. > 733: return VTransformApplyResult::make_scalar(_node); > 734: } Looks like code duplication, but I'll soon fill them with different behavior ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310249077 From kvn at openjdk.org Fri Aug 29 14:37:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 29 Aug 2025 14:37:51 GMT Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes In-Reply-To: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> Message-ID: On Fri, 29 Aug 2025 09:49:46 GMT, Emanuel Peter wrote: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;) > > We split the `VTransformScalarNode`: > - `VTransformMemopScalarNode` > - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`. > - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`. > - `VTransformLoopPhiNode` > - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges. > - `VTransformCFGNode` > - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG. > - `VTransformDataScalarNode` > - These represent all the normal "calculation" nodes in the loop. > - `VTransformInputScalarNode` -> `VTransformOuterNode`: > - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later. > > I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`). src/hotspot/share/opto/vtransform.cpp line 711: > 709: } > 710: > 711: VTransformApplyResult VTransformMemopScalarNode::apply(VTransformApplyState& apply_state) const { Why we need to pass unused `apply_state` in these methods? src/hotspot/share/opto/vtransform.cpp line 1009: > 1007: tty->print("node[%d %s] ", _node->_idx, _node->Name()); > 1008: _vpointer.print_on(tty, false); > 1009: } Consider separate RFE to use `outputStream*` for all prints. If we go into UL word we need to collect all outputs in one buffer as we discussed on recent meeting. src/hotspot/share/opto/vtransform.hpp line 457: > 455: class VTransformMemopScalarNode : public VTransformNode { > 456: private: > 457: MemNode* _node; Why not have `_node` in `VTransformNode` class and use `MemNode* node() const { return _node->as_Mem(); }` in this class? similar to other new classes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310324107 PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310320298 PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310339568 From epeter at openjdk.org Fri Aug 29 14:41:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 29 Aug 2025 14:41:43 GMT Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes In-Reply-To: References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> Message-ID: On Fri, 29 Aug 2025 14:26:12 GMT, Vladimir Kozlov wrote: >> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: >> https://github.com/openjdk/jdk/pull/20964 >> >> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. >> >> The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;) >> >> We split the `VTransformScalarNode`: >> - `VTransformMemopScalarNode` >> - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`. >> - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`. >> - `VTransformLoopPhiNode` >> - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges. >> - `VTransformCFGNode` >> - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG. >> - `VTransformDataScalarNode` >> - These represent all the normal "calculation" nodes in the loop. >> - `VTransformInputScalarNode` -> `VTransformOuterNode`: >> - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later. >> >> I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`). > > src/hotspot/share/opto/vtransform.cpp line 711: > >> 709: } >> 710: >> 711: VTransformApplyResult VTransformMemopScalarNode::apply(VTransformApplyState& apply_state) const { > > Why we need to pass unused `apply_state` in these methods? `apply` is `virtual`, and some other methods need the state. And I will soon need it here, to update the memory state: https://github.com/openjdk/jdk/pull/20964/files#diff-6c6ddfc4afe811f5d1eae1e4db638673ff3db0cb58d37bc569a75084a6a484c6R603-R620 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310359502 From epeter at openjdk.org Fri Aug 29 14:44:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 29 Aug 2025 14:44:43 GMT Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes In-Reply-To: References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> Message-ID: On Fri, 29 Aug 2025 14:25:14 GMT, Vladimir Kozlov wrote: >> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: >> https://github.com/openjdk/jdk/pull/20964 >> >> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. >> >> The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;) >> >> We split the `VTransformScalarNode`: >> - `VTransformMemopScalarNode` >> - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`. >> - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`. >> - `VTransformLoopPhiNode` >> - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges. >> - `VTransformCFGNode` >> - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG. >> - `VTransformDataScalarNode` >> - These represent all the normal "calculation" nodes in the loop. >> - `VTransformInputScalarNode` -> `VTransformOuterNode`: >> - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later. >> >> I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`). > > src/hotspot/share/opto/vtransform.cpp line 1009: > >> 1007: tty->print("node[%d %s] ", _node->_idx, _node->Name()); >> 1008: _vpointer.print_on(tty, false); >> 1009: } > > Consider separate RFE to use `outputStream*` for all prints. If we go into UL word we need to collect all outputs in one buffer as we discussed on recent meeting. Good idea! I'll file an RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310365758 From epeter at openjdk.org Fri Aug 29 15:21:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 29 Aug 2025 15:21:42 GMT Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes In-Reply-To: References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> Message-ID: On Fri, 29 Aug 2025 14:31:08 GMT, Vladimir Kozlov wrote: >> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: >> https://github.com/openjdk/jdk/pull/20964 >> >> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. >> >> The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;) >> >> We split the `VTransformScalarNode`: >> - `VTransformMemopScalarNode` >> - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`. >> - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`. >> - `VTransformLoopPhiNode` >> - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges. >> - `VTransformCFGNode` >> - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG. >> - `VTransformDataScalarNode` >> - These represent all the normal "calculation" nodes in the loop. >> - `VTransformInputScalarNode` -> `VTransformOuterNode`: >> - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later. >> >> I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`). > > src/hotspot/share/opto/vtransform.hpp line 457: > >> 455: class VTransformMemopScalarNode : public VTransformNode { >> 456: private: >> 457: MemNode* _node; > > Why not have `_node` in `VTransformNode` class and use `MemNode* node() const { return _node->as_Mem(); }` in this class? similar to other new classes. I feared this might get a question ;) I'd like to do it this way, and later there will need to be more changes. There will also be changes for `_nodes` in the vector nodes. Below some more thoughts - reading optional ;) -------------------------------------- `VTransformNode` is too high up - it is the superclass of all. And not all have a `node`. The vector nodes have a list of nodes. One option would be re-introducing some `VTransformScalarNode` that does nothing but hold that shared `_node` field. But I'd like to avoid having a public accessor for all of the subclasses. But picking a good name that unites all the subclasses is not so easy (data, memory, CFG, ... ). Conceptually, having such an in-between-class is not very helpful I fear. In the long-run, we will probably not just have the "identity transform" with `_node`, but these 3: - identity transform: reference a `node`, and keep it (maybe rewire inputs if memory is reordered). - add new node, where there is no old reference (e.g. `VTransformConvI2LNode`, there will be more like extract). - copy node: if we ever do "virtual unrolling" / widening, some nodes may not be vectorized (widened) and instead need to be duplicated/copied. So maybe eventually I'll need more than just a `_node`: - `_node`: for identity transform or copy - an opcode to generate a new node - an enum that says if we do: identity vs copy vs generate We might need to a similar thing for vector nodes. It is a little hard to model everything perfectly at the current state, without introducing massive code changes. I'd rather "atomize" everything and duplicate some code now. Later, it will be easier to unite things again. It is possible that I'll end up with something that covers everything, and put it in `VTransformNode`: - something like `VTransformNodePrototype` in the POC PR https://github.com/openjdk/jdk/pull/20964. - `_node`: a reference for identity transform, copy. But even if we create a new node, we might want to have an "approximate origin" to copy the node notes from. - enum that says if we do identity vs copy vs generate - `vlen`: 1 for scalar, >1 for vector - `element_type` (BasicType) - `opcode`: the target opcode (scalar opcode when scalar, vector if to be vectorized) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310448015 From epeter at openjdk.org Fri Aug 29 15:26:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 29 Aug 2025 15:26:44 GMT Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes In-Reply-To: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> Message-ID: On Fri, 29 Aug 2025 09:49:46 GMT, Emanuel Peter wrote: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;) > > We split the `VTransformScalarNode`: > - `VTransformMemopScalarNode` > - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`. > - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`. > - `VTransformLoopPhiNode` > - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges. > - `VTransformCFGNode` > - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG. > - `VTransformDataScalarNode` > - These represent all the normal "calculation" nodes in the loop. > - `VTransformInputScalarNode` -> `VTransformOuterNode`: > - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later. > > I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`). @vnkozlov Thanks for reviewing! I responded to all your comments :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27002#issuecomment-3237413742 From epeter at openjdk.org Fri Aug 29 15:26:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 29 Aug 2025 15:26:45 GMT Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes In-Reply-To: References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> Message-ID: On Fri, 29 Aug 2025 14:41:53 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vtransform.cpp line 1009: >> >>> 1007: tty->print("node[%d %s] ", _node->_idx, _node->Name()); >>> 1008: _vpointer.print_on(tty, false); >>> 1009: } >> >> Consider separate RFE to use `outputStream*` for all prints. If we go into UL word we need to collect all outputs in one buffer as we discussed on recent meeting. > > Good idea! I'll file an RFE. [JDK-8366445](https://bugs.openjdk.org/browse/JDK-8366445): C2 SuperWord: use outputStream instead of tty where possible ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310461075 From epeter at openjdk.org Fri Aug 29 15:47:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 29 Aug 2025 15:47:41 GMT Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes In-Reply-To: References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> Message-ID: On Fri, 29 Aug 2025 15:16:57 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vtransform.hpp line 457: >> >>> 455: class VTransformMemopScalarNode : public VTransformNode { >>> 456: private: >>> 457: MemNode* _node; >> >> Why not have `_node` in `VTransformNode` class and use `MemNode* node() const { return _node->as_Mem(); }` in this class? similar to other new classes. > > I feared this might get a question ;) > > I'd like to do it this way, and later there will need to be more changes. There will also be changes for `_nodes` in the vector nodes. > > Below some more thoughts - reading optional ;) > > -------------------------------------- > > `VTransformNode` is too high up - it is the superclass of all. And not all have a `node`. The vector nodes have a list of nodes. > > One option would be re-introducing some `VTransformScalarNode` that does nothing but hold that shared `_node` field. But I'd like to avoid having a public accessor for all of the subclasses. But picking a good name that unites all the subclasses is not so easy (data, memory, CFG, ... ). Conceptually, having such an in-between-class is not very helpful I fear. > > In the long-run, we will probably not just have the "identity transform" with `_node`, but these 3: > - identity transform: reference a `node`, and keep it (maybe rewire inputs if memory is reordered). > - add new node, where there is no old reference (e.g. `VTransformConvI2LNode`, there will be more like extract). > - copy node: if we ever do "virtual unrolling" / widening, some nodes may not be vectorized (widened) and instead need to be duplicated/copied. > > So maybe eventually I'll need more than just a `_node`: > - `_node`: for identity transform or copy > - an opcode to generate a new node > - an enum that says if we do: identity vs copy vs generate > > We might need to a similar thing for vector nodes `_nodes`. > > It is a little hard to model everything perfectly at the current state, without introducing massive code changes. > I'd rather "atomize" everything and duplicate some code now. Later, it will be easier to unite things again. > > It is possible that I'll end up with something that covers everything, and put it in `VTransformNode`: > - something like `VTransformNodePrototype` in the POC PR https://github.com/openjdk/jdk/pull/20964. > - `_node`: a reference for identity transform, copy. But even if we create a new node, we might want to have an "approximate origin" to copy the node notes from. > - enum that says if we do identity vs copy vs generate > - `vlen`: 1 for scalar, >1 for vector > - `element_type` (BasicType) > - `opcode`: the target opcode (scalar opcode when scalar, vector if to be vectorized) FYI: I have a bit of an overview of the tasks here: https://bugs.openjdk.org/browse/JDK-8340093 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310509161 From kvn at openjdk.org Fri Aug 29 16:49:41 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 29 Aug 2025 16:49:41 GMT Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes In-Reply-To: References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> Message-ID: On Fri, 29 Aug 2025 15:45:24 GMT, Emanuel Peter wrote: >> I feared this might get a question ;) >> >> I'd like to do it this way, and later there will need to be more changes. There will also be changes for `_nodes` in the vector nodes. >> >> Below some more thoughts - reading optional ;) >> >> -------------------------------------- >> >> `VTransformNode` is too high up - it is the superclass of all. And not all have a `node`. The vector nodes have a list of nodes. >> >> One option would be re-introducing some `VTransformScalarNode` that does nothing but hold that shared `_node` field. But I'd like to avoid having a public accessor for all of the subclasses. But picking a good name that unites all the subclasses is not so easy (data, memory, CFG, ... ). Conceptually, having such an in-between-class is not very helpful I fear. >> >> In the long-run, we will probably not just have the "identity transform" with `_node`, but these 3: >> - identity transform: reference a `node`, and keep it (maybe rewire inputs if memory is reordered). >> - add new node, where there is no old reference (e.g. `VTransformConvI2LNode`, there will be more like extract). >> - copy node: if we ever do "virtual unrolling" / widening, some nodes may not be vectorized (widened) and instead need to be duplicated/copied. >> >> So maybe eventually I'll need more than just a `_node`: >> - `_node`: for identity transform or copy >> - an opcode to generate a new node >> - an enum that says if we do: identity vs copy vs generate >> >> We might need to a similar thing for vector nodes `_nodes`. >> >> It is a little hard to model everything perfectly at the current state, without introducing massive code changes. >> I'd rather "atomize" everything and duplicate some code now. Later, it will be easier to unite things again. >> >> It is possible that I'll end up with something that covers everything, and put it in `VTransformNode`: >> - something like `VTransformNodePrototype` in the POC PR https://github.com/openjdk/jdk/pull/20964. >> - `_node`: a reference for identity transform, copy. But even if we create a new node, we might want to have an "approximate origin" to copy the node notes from. >> - enum that says if we do identity vs copy vs generate >> - `vlen`: 1 for scalar, >1 for vector >> - `element_type` (BasicType) >> - `opcode`: the target opcode (scalar opcode when scalar, vector if to be vectorized) > > FYI: I have a bit of an overview of the tasks here: > https://bugs.openjdk.org/browse/JDK-8340093 You put a lot of thoughts into this ;^) Okay, it is not big issue. It is just "natural" reaction from C++ programmer when he see a field duplication in subclasses. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310643738 From epeter at openjdk.org Fri Aug 29 16:49:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 29 Aug 2025 16:49:42 GMT Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes In-Reply-To: References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> Message-ID: On Fri, 29 Aug 2025 16:45:47 GMT, Vladimir Kozlov wrote: >> FYI: I have a bit of an overview of the tasks here: >> https://bugs.openjdk.org/browse/JDK-8340093 > > You put a lot of thoughts into this ;^) Okay, it is not big issue. It is just "natural" reaction from C++ programmer when he see a field duplication in subclasses. Haha yes ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310646401 From dfenacci at openjdk.org Fri Aug 29 16:50:00 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 29 Aug 2025 16:50:00 GMT Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee [v2] In-Reply-To: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> Message-ID: <122dqiroSMOynXT4p5qIp4Tlry-piG_7XDWXRMYropU=.1c2b3ccd-5d75-4360-bcad-392ca10e8387@github.com> > # Issue > The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one. > In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan. > > # Cause > The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading. > > For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method. > What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`. > > More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same. > > # Fix > > This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. So, we change the assert to **check for invalid dependencies if the current callee and the previous one don't match**. > > # Testing > > This issue is very very, very intermittent and depending on a number of factors. This ... Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8355354: mvoe assert to ideal ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26441/files - new: https://git.openjdk.org/jdk/pull/26441/files/0ed04442..15bcb65e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26441&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26441&range=00-01 Stats: 34 lines in 3 files changed: 17 ins; 14 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26441.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26441/head:pull/26441 PR: https://git.openjdk.org/jdk/pull/26441 From dfenacci at openjdk.org Fri Aug 29 16:52:44 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 29 Aug 2025 16:52:44 GMT Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee [v2] In-Reply-To: References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> Message-ID: On Thu, 21 Aug 2025 18:40:22 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/callGenerator.cpp line 487: >> >>> 485: "repeated inline attempt with different callee"); >>> 486: } >>> 487: #endif >> >> I'm wondering if there might be other reasons that the callee might change, like JVMTI class redefinition. Also, it sounds like the CHA case is really rare, and we check dependencies at the end anyway, so the easiest fix for class redefinition and CHA would be to ignore the new callee and keep the old one here. > > I second that. And it aligns with our effort to make CI queries report stable results. > > (FTR here's what I proposed to Damon privately: "Another alternative is to cache and reuse cg->callee_method() when it becomes non-null. And turn repeated CHA requests (Compile::optimize_inlining) into verification logic.") > I'm wondering if there might be other reasons that the callee might change, like JVMTI class redefinition I guess there could be. For JVMTI we could possibly check for `Method::is_old` or `Method::is_obsolete`? But still, it might not be the only reason... > so the easiest fix for class redefinition and CHA would be to ignore the new callee and keep the old one here. I'm tempted by setting the callee if it is null and just removing the original assert but @iwanowww suggested moving the assert to the `Ideal` function. I've just pushed a change that should be doing that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26441#discussion_r2310651866 From kvn at openjdk.org Fri Aug 29 16:57:42 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 29 Aug 2025 16:57:42 GMT Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes In-Reply-To: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com> Message-ID: On Fri, 29 Aug 2025 09:49:46 GMT, Emanuel Peter wrote: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;) > > We split the `VTransformScalarNode`: > - `VTransformMemopScalarNode` > - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`. > - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`. > - `VTransformLoopPhiNode` > - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges. > - `VTransformCFGNode` > - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG. > - `VTransformDataScalarNode` > - These represent all the normal "calculation" nodes in the loop. > - `VTransformInputScalarNode` -> `VTransformOuterNode`: > - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later. > > I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`). Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27002#pullrequestreview-3169433640 From duke at openjdk.org Fri Aug 29 17:18:20 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 29 Aug 2025 17:18:20 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v46] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality > > Additional Testing: > - [x] Linux x64 fastdebug tier 1/2/3/4 > - [x] Linux aarch64 fastdebug tier 1/2/3/4 Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Fix NMethodRelocationTest.java logging race ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/03a69587..a2051637 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=45 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=44-45 Stats: 20 lines in 1 file changed: 7 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From kvn at openjdk.org Fri Aug 29 17:30:58 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 29 Aug 2025 17:30:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v46] In-Reply-To: References: Message-ID: On Fri, 29 Aug 2025 17:18:20 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix NMethodRelocationTest.java logging race Good. I will wait GHA testing finished and will submit my testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3237732041 From manc at openjdk.org Fri Aug 29 19:10:05 2025 From: manc at openjdk.org (Man Cao) Date: Fri, 29 Aug 2025 19:10:05 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation [v4] In-Reply-To: References: Message-ID: <_4Er-HME8gnBD7lLaVcB1-rJoXG_psVh5HAqlXoSboQ=.cd97a3ee-fa3a-48b5-9ff7-33c5e5bb292f@github.com> > Hi, > > Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause. > > I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation. > > -Man Man Cao has updated the pull request incrementally with one additional commit since the last revision: Add -Xbatch to test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26932/files - new: https://git.openjdk.org/jdk/pull/26932/files/12cd9c29..99a584aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26932&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26932&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26932.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26932/head:pull/26932 PR: https://git.openjdk.org/jdk/pull/26932 From manc at openjdk.org Fri Aug 29 19:13:43 2025 From: manc at openjdk.org (Man Cao) Date: Fri, 29 Aug 2025 19:13:43 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation [v3] In-Reply-To: References: Message-ID: <4UzPNngmZgfCthbvxstyTBk54tMXA4eMtGnOjUwtX_8=.fb957720-a60d-4d85-a891-d018f287d3e5@github.com> On Fri, 29 Aug 2025 10:46:25 GMT, Tobias Hartmann wrote: > I think the test needs `-Xbatch` Thanks for the suggestion and running the test in more environments. I guess the reason is that in some environments the compiler thread runs too slow, so the program finishes before the compiler can compile the method. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26932#issuecomment-3237961831 From manc at openjdk.org Fri Aug 29 23:12:18 2025 From: manc at openjdk.org (Man Cao) Date: Fri, 29 Aug 2025 23:12:18 GMT Subject: RFR: 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation [v5] In-Reply-To: References: Message-ID: > Hi, > > Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause. > > I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation. > > -Man Man Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into JDK-8366118-DontCompileHugeMethods - Add -Xbatch to test - Use List.of in test - Add a jtreg test - 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26932/files - new: https://git.openjdk.org/jdk/pull/26932/files/99a584aa..dfbca676 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26932&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26932&range=03-04 Stats: 12945 lines in 296 files changed: 10570 ins; 1202 del; 1173 mod Patch: https://git.openjdk.org/jdk/pull/26932.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26932/head:pull/26932 PR: https://git.openjdk.org/jdk/pull/26932 From missa at openjdk.org Fri Aug 29 23:38:42 2025 From: missa at openjdk.org (Mohamed Issa) Date: Fri, 29 Aug 2025 23:38:42 GMT Subject: RFR: 8364305: Support AVX10 saturating floating point conversion instructions [v2] In-Reply-To: References: Message-ID: On Thu, 28 Aug 2025 13:33:27 GMT, Emanuel Peter wrote: > @missa-prime Looks like an interesting patch! Do you think you could add some sort of IR test here, to verify that the correct code is generated on AVX10 vs lower AVX? @eme64 Thanks for the suggestion. This patch doesn't modify any IR though, so I'm not sure what IR test(s) to add. I could modify existing tests (`test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`, `test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java`, `test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java`) that use IR nodes as dependencies though. Would that be sufficient? Or did you have something else in mind? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26919#issuecomment-3238713299 From missa at openjdk.org Fri Aug 29 23:46:18 2025 From: missa at openjdk.org (Mohamed Issa) Date: Fri, 29 Aug 2025 23:46:18 GMT Subject: RFR: 8364305: Support AVX10 saturating floating point conversion instructions [v3] In-Reply-To: References: Message-ID: <_Wv0Roo5xUHjswP_JUy6yzoU5KCwNpIoX3S2QBceUbE=.05b5bbbd-840b-4162-a454-94a9ddc2a69f@github.com> > Intel® AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity. > > Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist ers to store intermediate results. > > This change uses the new AVX10.2 scalar (VCVTTSS2SIS or VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11). > > 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` > 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` > 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` > 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` > 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` > 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` > 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` > 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` > > [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-int... Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: Fix input size enum values for AVX 10.2 conversion instructions that take memory as the source ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26919/files - new: https://git.openjdk.org/jdk/pull/26919/files/e67e376e..be5c0b4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26919.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919 PR: https://git.openjdk.org/jdk/pull/26919 From kvn at openjdk.org Sat Aug 30 00:29:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 30 Aug 2025 00:29:55 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v46] In-Reply-To: References: Message-ID: On Fri, 29 Aug 2025 17:18:20 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix NMethodRelocationTest.java logging race compiler/codecache/stress/UnexpectedDeoptimizationAllTest.java test failed on AMD EPIC (avx512): # Internal Error (/workspace/open/src/hotspot/share/code/compiledIC.cpp:167), pid=1984031, tid=1984059 # assert(CompiledICLocker::is_safe(_method)) failed: mt unsafe call # # JRE version: Java(TM) SE Runtime Environment (26.0) (fastdebug build 26-internal-2025-08-29-2249418.vladimir.kozlov.jdkgit2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-internal-2025-08-29-2249418.vladimir.kozlov.jdkgit2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xb98a10] CompiledIC_at(RelocIterator*)+0x130 # Current thread (0x00007efe842b27d0): JavaThread "C2 CompilerThread2" daemon [_thread_in_vm, id=1984059, stack(0x00007efec428f000,0x00007efec438f000) (1024K)] Current CompileTask: C2:4006 6467 % ! 4 compiler.codecache.stress.Helper::callMethod @ 4 (64 bytes) Stack: [0x00007efec428f000,0x00007efec438f000], sp=0x00007efec438a7a0, free space=1005k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xb98a10] CompiledIC_at(RelocIterator*)+0x130 (compiledIC.cpp:167) V [libjvm.so+0x173e00e] nmethod::metadata_do(MetadataClosure*)+0x21e (nmethod.cpp:2606) V [libjvm.so+0x1742e94] nmethod::verify()+0x1d4 (nmethod.cpp:3289) V [libjvm.so+0x1748646] nmethod::new_nmethod(methodHandle const&, int, int, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, CompLevel, char*, int, JVMCINMethodData*)+0x366 (nmethod.cpp:1223) V [libjvm.so+0xa31d5d] ciEnv::register_method(ciMethod*, int, CodeOffsets*, int, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, bool, bool, bool, bool, int)+0x35d (ciEnv.cpp:1062) V [libjvm.so+0x17fb40d] PhaseOutput::install_code(ciMethod*, int, AbstractCompiler*, bool, bool)+0x16d (output.cpp:3444) V [libjvm.so+0xb7a918] Compile::Code_Gen()+0xa88 (compile.cpp:3125) V [libjvm.so+0xb7fb7f] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x208f (compile.cpp:892) V [libjvm.so+0x9a2176] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x466 (c2compiler.cpp:147) V [libjvm.so+0xb8ef68] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xb48 (compileBroker.cpp:2340) V [libjvm.so+0xb90180] CompileBroker::compiler_thread_loop()+0x5c0 (compileBroker.cpp:1984) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3238759038 From kvn at openjdk.org Sat Aug 30 00:35:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 30 Aug 2025 00:35:56 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v46] In-Reply-To: References: Message-ID: On Fri, 29 Aug 2025 17:18:20 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix NMethodRelocationTest.java logging race It failed on linux-x64 and linux-aarch64. I tried locally on linux-x64 but it passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3238763509 From epeter at openjdk.org Sat Aug 30 11:40:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 30 Aug 2025 11:40:43 GMT Subject: RFR: 8364305: Support AVX10 saturating floating point conversion instructions [v2] In-Reply-To: References: Message-ID: <_cf7KdBncid__i2bC0izlHl3WgOUk7VPC5COXxoAl8o=.7b68c95a-ae1a-4b2b-b4c8-f7ce79837a34@github.com> On Fri, 29 Aug 2025 23:35:37 GMT, Mohamed Issa wrote: >> @missa-prime Looks like an interesting patch! Do you think you could add some sort of IR test here, to verify that the correct code is generated on AVX10 vs lower AVX? > >> @missa-prime Looks like an interesting patch! Do you think you could add some sort of IR test here, to verify that the correct code is generated on AVX10 vs lower AVX? > > @eme64 Thanks for the suggestion. This patch doesn't modify any IR though, so I'm not sure what IR test(s) to add. I could modify existing tests (`test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`, `test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java`, `test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java`) that use IR nodes as dependencies though. Would that be sufficient? Or did you have something else in mind? @missa-prime Could you not match on the mach graph? See example: `test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java` with `CompilePhase.FINAL_CODE`. Maybe another `CompilePhase` is better. I have never matched on the mach graph myself, but I wonder if it may be useful here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26919#issuecomment-3239207525 From fgao at openjdk.org Sun Aug 31 15:19:43 2025 From: fgao at openjdk.org (Fei Gao) Date: Sun, 31 Aug 2025 15:19:43 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Thu, 28 Aug 2025 14:58:25 GMT, Emanuel Peter wrote: > I'm a little sick and don't feel very focused, so I'll have to look at the PR next week. No problem. Take care and hope you get well soon :) > BTW: I just integrated #24278 which may have silent merge conflicts, so it would be good if you merged and tested again. Once you do that I could also run some internal testing, if you like :) Thanks for pointing that out. I?ll rebase and verify it shortly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3240216411